Apache Spark Streaming

The Apache Streaming module is a stream processing-based module within Apache Spark. It uses the Spark cluster to offer the ability to scale to a high degree. Being based on Spark, it is also highly fault tolerant, having the ability to rerun failed tasks by checkpointing the data stream that is being processed. The following topics will be covered in this chapter after an introductory section, which will provide a practical overview of how Apache Spark processes stream-based data:

  • Error recovery and checkpointing
  • TCP-based stream processing
  • File streams
  • Kafka stream source

For each topic, we will provide a worked example in Scala and show how the stream-based architecture can be set up and tested.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.88.54