Summary

I could have provided streaming examples for systems like Kinesis, as well as queuing systems, but there was not room in this chapter. Twitter streaming has been examined by example in the checkpointing section.

This chapter has provided practical examples of data recovery via checkpointing in Spark streaming. It has also touched on the performance limitations of checkpointing and shown that that the checkpointing interval should be set at five to ten times the Spark stream batch interval. Checkpointing provides a stream-based recovery mechanism in the case of Spark application failure.

This chapter has provided some stream-based worked examples for TCP, File, Flume, and Kafka-based Spark stream coding. All the examples here are based on Scala, and are compiled with sbt. All of the code will be released with this book. Where the example architecture has become over-complicated, I have provided an architecture diagram (I'm thinking of the Kafka example here).

It is clear to me that the Apache Spark streaming module contains a rich source of functionality that should meet most of your needs, and will grow as future releases of Spark are delivered. Remember to check the Apache Spark website (http://spark.apache.org/), and join the Spark user list via . Don't be afraid to ask questions, or make mistakes, as it seems to me that mistakes teach more than success.

The next chapter will examine the Spark SQL module, and will provide worked examples of SQL, data frames, and accessing Hive among other topics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.79.241