Summary

In this chapter, we have learned how to calculate averages with map and reduce. We also learned faster average computations with aggregate. Finally, we learned that pivot tables allow us to aggregate data based on different values of features, and that, with pivot tables in PySpark, we can leverage handy functions, such as reducedByKey or countByKey.

In the next chapter, we will learn about MLlib, which involves machine learning, which is a very hot topic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.212.102