Introduction

With the recent advancements in cluster computing coupled with the rise of big data, the field of machine learning has been pushed to the forefront of computing. The need for an interactive platform that enables data science at scale has long been a dream that is now a reality.

The following three areas together have enabled and accelerated interactive data science at scale:

Apache Spark: A unified technology platform for data science that combines a fast compute engine and fault-tolerant data structures into a well-designed and integrated offering
Machine learning: A field of artificial intelligence that enables machines to mimic some of the tasks originally reserved exclusively for the human brain
Scala: A modern JVM-based language that builds on traditional languages, but unites functional and object-oriented concepts without the verboseness of other languages

First, we need to set up the development environment, which will consist of the following components:

Spark
IntelliJ community edition IDE
Scala

The recipes in this chapter will give you detailed instructions for installing and configuring the IntelliJ IDE, Scala plugin, and Spark. After the development environment is set up, we'll proceed to run one of the Spark ML sample codes to test the setup.

Table of Contents for Introduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Introduction