So far, we have discussed batch and streaming applications and also SQL operations using Spark. In this chapter, we will discuss Spark modules on machine learning. We will briefly discuss what machine learning is and then move ahead with Spark's implementation of machine learning.
We will primarily focus on the spark.ml package of Spark that works on dataframe as an input data type. We will cover topics like pipeline and then move ahead with operations on features.