Structured Streaming

Structured is a brand new edition in Apache Spark's streaming processing vertical. It is a stream processing engine built on top of the Spark SQL engine. With the introduction of structured streaming, a unification bond of batch processing and stream processing as it allows us to develop a stream processing is enabled application similar to the batch processing application. At the same time, it is scalable and fault tolerant as well.

As per Apache Spark's documentation, Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.

Instead of using DStream in structured streaming, the dataset API can be used and it is the responsibility of the Spark SQL engine to keep the dataset updated as new streaming data arrives. As the dataset API is used, all the Spark SQL operations are available. Therefore, users can use SQL queries on the stream data using the optimized Spark SQL engine and it provides an easier way of aggregations on the streaming data.

Alpha version of structured streaming is released in Spark 2.1 and it is not ready for production deployment yet.

Let's understand the concept of structured streaming with examples. We will use structured streaming to solve the average flight temperature problem that we solved using stateful processing.

Table of Contents for Structured Streaming

Create new playlist

Sign In

Sign Up

Table of Contents for
Structured Streaming