Spark provides the so-called "Output Sinks" for streams. The sink defines how and where the stream is written; for example, as a parquet file or as a in-memory table. However, for our application, we will simply show the stream output in the console:
outputDataStream.writeStream.format("console").start().awaitTermination()
The preceding code directly starts the stream processing and waits until the termination of the application. The application simply process every new file in a given folder (in our case, given by the environment variable, `APPDATADIR`). For example, given a file with five loan applications, the stream produces a table with five scored events:
The important part of the event is represented by the last columns, which contain predicted values:
If we write another file with a single loan application into the folder, the application will show another scored batch:
In this way, we can deploy trained models and corresponding data-processing operations and let them score actual events. Of course, we just demonstrated a simple use case; a real-life scenario would be much more complex involving a proper model validation, A/B testing with the currently used models, and the storing and versioning of the models.