How it works...

In certain situations, we cannot use batch methods to load and capture the events and then react to them. We can use creative methods of capturing events in the memory or a landing DB and then rapidly marshal that over to another system for processing, but most of these systems fail to act as streaming systems and often are very expensive to build.

Spark provides a near real time (also referred to as subjective real time) that can receive incoming sources, such as Twitter feeds, signals, and so, on via connectors (for example, a Kafka connector) and then process and present them as an RDD interface.

These are the elements needed to build and construct streaming KMeans in Spark:

Use the streaming context as opposed to the regular Spark context used so far:

val ssc = new StreamingContext(conf, Seconds(batchDuration.toLong))

Select your connector to connect to a data source and receive events:
- Twitter
- Kafka
- Third party
- ZeroMQ
- TCP
- ........
Create your streaming KMeans model; set the parameters as needed:

model = new StreamingKMeans()

Train and predict as usual:
- Have in mind that K cannot be changed on the fly
Start the context and await for the termination signal to exit:
- ssc.start()
- ssc.awaitTermination()

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...