In this section, I'm going to get you set up using Apache Spark, and show you some examples of actually using Apache Spark to solve some of the same problems that we solved using a single computer in the past in this book. The first thing we need to do is get Spark set up on your computer. So, we're going to walk you through how to do that in the next couple of sections. It's pretty straightforward stuff, but there are a few gotchas. So, don't just skip these sections; there are a few things you need to pay special attention to get Spark running successfully, especially on a Windows system. Let's get Apache Spark set up on your system, so you can actually dive in and start playing around with it.
We're going to be running this just on your own desktop for now. But, the same programs that we're going to write in this chapter could be run on an actual Hadoop cluster. So, you can take these scripts that we're writing and running locally on your desktop in Spark standalone mode, and actually run them from the master node of an actual Hadoop cluster, then let it scale up to the entire power of a Hadoop cluster and process massive Datasets that way. Even though we're going to set things up to run locally on your own computer, keep in mind that these same concepts will scale up to running on a cluster as well.