Functional Scala for the data scientists

For performing interactive data cleaning, processing, munging, and analysis, many data scientists use R or Python as their favorite tool. However, there are many data scientists who tend to get very attached to their favorite tool--that is, Python or R and try to solve all data analytics problems or jobs using that tool. Thus, introducing them to a new tool can be very challenging in most circumstances as the new tool has more syntax and a new set of patterns to learn before using the new tool to solve their purpose.

There are other APIs in Spark written in Python and R such as PySpark and SparkR respectively that allow you to use them from Python or R. However, most Spark books and online examples are written in Scala. Arguably, we think that learning how to work with Spark using the same language on which the Spark code has been written will give you many advantages over Java, Python, or R as a data scientist:

  • Better performance and removes the data processing overhead
  • Provides access to the latest and greatest features of Spark
  • Helps to understand the Spark philosophy in a transparent way

Analyzing data means that you are writing Scala code to retrieve data from the cluster using Spark and its APIs (that is, SparkR, SparkSQL, Spark Streaming, Spark MLlib, and Spark GraphX). Alternatively, you're developing a Spark application using Scala to manipulate that data locally on your own machine. In both cases, Scala is your real friend and will pay you dividends in time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.110.32