Submitting Spark jobs on YARN cluster

Now that our YARN cluster with the minimum requirement (for executing a small Spark job to be frank) is ready, to launch a Spark application in a cluster mode of YARN, you can use the following submit command:

$ SPARK_HOME/bin/spark-submit --classpath.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]

For running our KMeansDemo, it should be done like this:

$ SPARK_HOME/bin/spark-submit  
--class "com.chapter15.Clustering.KMeansDemo"
--master yarn
--deploy-mode cluster
--driver-memory 16g
--executor-memory 4g
--executor-cores 4
--queue the_queue
KMeans-0.0.1-SNAPSHOT-jar-with-dependencies.jar
Saratoga_NY_Homes.txt

The preceding submit command starts a YARN cluster mode with the default application master. Then KMeansDemo will be running as a child thread of the application master. For the status updates and for displaying them in the console, the client will periodically poll the application master. When your application (that is, KMeansDemo in our case) has finished its execution, the client will be exited.

Upon submission of your job, you might want to see the progress using the Spark web UI or Spark history server. Moreover, you should refer to Chapter 18, Testing and Debugging Spark) to know how to analyze driver and executor logs.

To launch a Spark application in a client mode, you should use the earlier command, except that you will have to replace the cluster with the client. For those who want to work with Spark shell, use the following in client mode:

$ SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.103.122