Introducing Spark Streaming

With the advancement and expansion of big data technologies, most of the companies have shifted their focus towards data-driven decision making. It has now become an essential and integral part of the business. In the current world, not only the analytics is important, but also how early it is made available is important. Offline data analytics, as known as batch analytics, help in providing analytics on the history data. On the other hand, online data analytics showcase what is happening in real time. It helps organizations to take decisions as early as possible to keep themselves ahead of their competitors. Online analytics/near real time analytics is done by reading incoming streams of data, for example user activities for e-commerce websites, and process those streams to get valuable results.

The Spark Streaming API is a library that allows you to process data from live streams at near real time. It provides high scalability, fault tolerance, high throughput, and so on. It also brings stateful APIs over the live data stream out of the box and provides connectors to various data sources, for example, Kafka, Twitter, and Kinesis for reading live data.

Spark Streaming introduced an abstraction known as discretized stream as known as DStream. Incoming streams of data are converted into DStreams, which is internally created as a sequence of RDDs. This enables seamless integration of Spark Streaming with Spark Core components and other extensions such as Spark SQL, MLlib, and so on.

DStream in Spark Streaming is represented by a stream of data divided in small batches as known as micro batches. The Spark Streaming API transforms the stream of data into micro batches and feeds it to the Spark engine for processing.

Thus Spark Streaming clubs advantage of batch processing with streaming. Instead of processing an event at a time, a small batch of events is processed at once. This allows Spark to keep a unified programming paradigm for both batch and real-time streaming verticals.

Before proceeding with a Spark Streaming example, let's discuss the concept of micro batching in detail:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.130.199