By setting SPARK_HOME

At first, download and place the Spark distribution at your preferred place, say /home/asif/Spark. Now let's set the SPARK_HOME as follows:

echo "export SPARK_HOME=/home/asif/Spark" >> ~/.bashrc

Now let's set PYTHONPATH as follows:

echo "export PYTHONPATH=$SPARK_HOME/python/" >> ~/.bashrc
echo "export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.1-src.zip" >> ~/.bashrc

Now we need to add the following two paths to the environmental path:

echo "export PATH=$PATH:$SPARK_HOME" >> ~/.bashrc
echo "export PATH=$PATH:$PYTHONPATH" >> ~/.bashrc

Finally, let's refresh the current terminal so that the newly modified PATH variable is used:

source ~/.bashrc

PySpark depends on the py4j Python package. It helps the Python interpreter to dynamically access the Spark object from the JVM. This package can be installed on Ubuntu as follows:

$ sudo pip install py4j

Alternatively, the default py4j, which is already included in Spark ($SPARK_HOME/python/lib), can be used too.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.79.45