Installation of Spark standalone cluster

In this section, we will learn how to install Spark Standalone cluster. Spark distribution provides the init scripts to install Spark standalone clusters. We first have to download Scala and Spark. Please refer to the Getting Started with Spark section in Chapter 3, Lets Start for detailed explanation about downloading and setting up Scala and Spark. Let's quickly recap the steps about setting up Spark.

Users can download the Spark distribution from the following URL: https://spark.apache.org/downloads.html.

In this book, we are working with Spark 2.1.1, which is the latest at the time of publishing this book. Once downloaded, run the following commands:

tar -zxf spark-2.1.1-bin-hadoop2.7.tgz 
sudo mv spark-2.1.1-bin-hadoop2.7  /usr/local/spark 
Directory location can be different as per user's requirement.

Also, it is suggested to set the following environment variables as well (not mandatory):

export SPARK_HOME=/usr/local/spark
export PATH=$PATH$SPARK_HOME/bin

As we are going to start the Spark cluster, please repeat the preceding steps on all nodes. Once the steps are performed, Spark standalone cluster can be started as follows:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.206.112