Structured streaming

Structured streaming is new in Apache Spark 2.0+ and is now in GA from Spark 2.2 release. You will see details in the next section along with examples of how to use structured streaming.

For more details on the Kafka integration in structured streaming, refer to https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html.

An example of how to use Kafka source stream in structured streaming is as follows:

val ds1 = spark
.readStream
.format("Kafka")
.option("Kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()

ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.as[(String, String)]

An example of how to use Kafka source instead of source stream (in case you want more batch analytics approach) is as follows:

val ds1 = spark
.read
.format("Kafka")
.option("Kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()

ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.as[(String, String)]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.53.143