Submitting Spark job for cluster analysis

The examples shown in this chapter can be made scalable for the even larger dataset to serve different purposes. You can package all three clustering algorithms with all the required dependencies and submit them as a Spark job in the cluster. Now use the following lines of code to submit your Spark job of K-means clustering, for example (use similar syntax for other classes), for the Saratoga NY Homes dataset:

# Run application as standalone mode on 8 cores 
SPARK_HOME/bin/spark-submit
--class org.apache.spark.examples.KMeansDemo
--master local[8]
KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar
Saratoga_NY_Homes.txt

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
SPARK_HOME/bin/spark-submit
--class org.apache.spark.examples.KMeansDemo
--master yarn
--deploy-mode cluster # can be client for client mode
--executor-memory 20G
--num-executors 50
KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar
Saratoga_NY_Homes.txt

# Run on a Mesos cluster in cluster deploy mode with supervising
SPARK_HOME/bin/spark-submit
--class org.apache.spark.examples.KMeansDemo
--master mesos://207.184.161.138:7077 # Use your IP aadress
--deploy-mode cluster
--supervise
--executor-memory 20G
--total-executor-cores 100
KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar
Saratoga_NY_Homes.txt
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.253.239