Spark streaming

Spark streaming leverages Spark core's fast scheduling capability to perform streaming analytics by ingesting real-time streaming data from various sources such as HDFS, Kafka, Flume, Twitter, ZeroMQ, Kinesis, and so on. Spark streaming uses micro-batches of data to process the data in chunks and, uses a concept known as DStreams, Spark streaming can operate on the RDDs, applying transformations and actions as regular RDDs in the Spark core API. Spark streaming operations can recover from failure automatically using various techniques. Spark streaming can be combined with other Spark components in a single program, unifying real-time processing with machine learning, SQL, and graph operations.


We cover Spark streaming in detail in the
Chapter 9, Stream Me Up, Scotty - Spark Streaming.

In addition, the new Structured Streaming API makes Spark streaming programs more similar to Spark batch programs and also allows real-time querying on top of streaming data, which is complicated with the Spark streaming library before Spark 2.0+.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.102.178