Run the examples in Apache Spark

In order to run on Apache Spark, we'll use the Hortonworks HDP Sandbox. In contrast to other chapters, where we just used the spark-shell and pasted some lines of Scala code, in this case, this is not possible anymore. So we need to create a self-contained JAR file that contains the Deeplearning4j application plus all dependencies in a single JAR file.

Sound tedious? Luckily, Maven is our best friend. So, first, using the command line we just cdinto the root folder within the workspace of the dl4j-examples-scala folder, and run mvn package:

It takes quite some time, but after a while, the command ends with the following output:

Now we need to copy this JAR to the Apache Spark cluster using scp (SSH Secure Copy). On a Mac, we are using the following command line for doing so:

Then we SSH to the cluster, and run the following command:

We see that the system is waiting for data. Therefore, we just click on the reset button of the test data generator again, and the tumbling window waits to be filled:

The Hortonworks HDP Sandbox V2.6.0 currently supports Apache Spark V1.6.1 and V2.1. Support for Apache Spark V2.2 will available shortly. In order to use V2.1, we need to always run this command in advance:

export SPARK_MAJOR_VERSION=2

In order to prevent this, just issue the following command once, and after the next login, you always will use Apache Spark V2.1:

echo "export SPARK_MAJOR_VERSION=2" >> /root/.bashrc

Finally, after this has happened, we see data arriving and the same neural networks that we've run within Eclipse now run in parallel on an Apache Spark cluster. Isn't that awesome?

Table of Contents for Run the examples in Apache Spark

Create new playlist

Sign In

Sign Up

Table of Contents for
Run the examples in Apache Spark