Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summary

This chapter focused on handling streaming data from sources such as Kafka, socket, and filesystem. We also covered various stateful and stateless transformation of DStream along with checkpointing of data. But chekpointing of data alone does not guarantee fault tolerance and hence we discussed other approaches to make Spark Streaming job fault tolerant. We also talked about the transform operation, which comes in handy where operations of RDD API is not available in DStreams. Spark 2.0 introduced structured streaming as a separate module, however, because of its similarity with Spark Streaming, we discussed the newly introduced APIs of structured streaming also.

In the next chapter, we will focus on introducing the concepts of machine learning and then move towards its implementation using Apache Spark MLlib libraries. We will also discuss some real-world problems using Spark MLlib.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

52.14.240.252

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary