Summary

In this chapter, we learned summary statistics and computing the summary statistics with MLlib. We also learned about Pearson and Spearman correlations, and how we can discover these correlations in our datasets using PySpark. Finally, we learned one particular way of performing hypothesis testing, which is called the Pearson chi-square test. We then used PySpark's hypothesis-testing functions to test our hypotheses on large datasets.

In the next chapter, we're going to look at putting the structure on our big data with Spark SQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.142.194