Structured streaming

Structured streaming is new in Apache Spark 2.0+ and is now in GA from Spark 2.2 release. You will see details in the next section along with examples of how to use structured streaming.

For more details on the Kafka integration in structured streaming, refer to https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html.

An example of how to use Kafka source stream in structured streaming is as follows:

val ds1 = spark
 .readStream
 .format("Kafka")
 .option("Kafka.bootstrap.servers", "host1:port1,host2:port2")
 .option("subscribe", "topic1")
 .load()

ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
 .as[(String, String)]

An example of how to use Kafka source instead of source stream (in case you want more batch analytics approach) is as follows:

val ds1 = spark
 .read
 .format("Kafka")
 .option("Kafka.bootstrap.servers", "host1:port1,host2:port2")
 .option("subscribe", "topic1")
 .load()

ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
 .as[(String, String)]

Table of Contents for Structured streaming

Create new playlist

Sign In

Sign Up

Table of Contents for
Structured streaming