Summary

This chapter covered various topics that govern the functioning of RDD-like partitioning and then used advanced transformations and actions to achieve specific requirements. We also looked at the limitations of sharing variables across executor nodes and how it can be achieved using broadcast variables and accumulators.

The next chapter introduces Spark SQL and related concepts like datafame, dataset, UDF and so on. We'll also discuss SQLContext and the newly introduced SparkSession and how its introduction has simplified the whole process of dealing with the Hive metastore.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.213.212