Summary

I've tried to establish a common ground to perform a more complex data science later in the book. Don't expect these to be a complete set of exploratory techniques, as the exploratory techniques can extend to running very complex modes. However, we covered simple aggregations, sampling, file operations such as read and write, working with tools such as notebooks and Spark DataFrames, which brings familiar SQL constructs into the arsenal of an analyst working with Spark/Scala.

The next chapter will take a completely different turn by looking at the data pipelines as a part of a data-driven enterprise and cover the data discovery process from the business perspective: what are the ultimate goals we are trying to accomplish by doing the data analysis. I will cover a few traditional topics of ML, such as supervised and unsupervised learning, after this before delving into more complex representations of the data, where Scala really shows it's advantage over SQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.92.107