Spark properties

As discussed previously, Spark properties control most of the application-specific parameters and can be set using a SparkConf object of Spark. Alternatively, these parameters can be set through the Java system properties. SparkConf allows you to configure some of the common properties as follows:

setAppName() // App name 
setMaster() // Master URL
setSparkHome() // Set the location where Spark is installed on worker nodes.
setExecutorEnv() // Set single or multiple environment variables to be used when launching executors.
setJars() // Set JAR files to distribute to the cluster.
setAll() // Set multiple parameters together.

An application can be configured to use a number of available cores on your machine. For example, we could initialize an application with two threads as follows. Note that we run with local [2], meaning two threads, which represents minimal parallelism and using local [*], which utilizes all the available cores in your machine. Alternatively, you can specify the number of executors while submitting Spark jobs with the following spark-submit script:

val conf = new SparkConf() 
.setMaster("local[2]")
.setAppName("SampleApp")
val sc = new SparkContext(conf)

There might be some special cases where you need to load Spark properties dynamically when required. You can do this while submitting a Spark job through the spark-submit script. More specifically, you may want to avoid hardcoding certain configurations in SparkConf.

Apache Spark precedence:
Spark has the following precedence on the submitted jobs: configs coming from a config file have the lowest priority. The configs coming from the actual code have higher priority with respect to configs coming from a config file, and configs coming from the CLI through the Spark-submit script have higher priority.

For instance, if you want to run your application with different masters, executors, or different amounts of memory, Spark allows you to simply create an empty configuration object, as follows:

val sc = new SparkContext(new SparkConf())

Then you can provide the configuration for your Spark job at runtime as follows:

SPARK_HOME/bin/spark-submit 
--name "SmapleApp"
--class org.apache.spark.examples.KMeansDemo
--master mesos://207.184.161.138:7077 # Use your IP address
--conf spark.eventLog.enabled=false
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails"
--deploy-mode cluster
--supervise
--executor-memory 20G
myApp.jar

SPARK_HOME/bin/spark-submit will also read configuration options from SPARK_HOME /conf/spark-defaults.conf, in which each line consists of a key and a value separated by whitespace. An example is as follows:

spark.master  spark://5.6.7.8:7077 
spark.executor.memor y 4g
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer

Values that are specified as flags in the properties file will be passed to the application and merged with those ones specified through SparkConf. Finally, as discussed earlier, the application web UI at http://<driver>:4040 lists all the Spark properties under the Environment tab.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.149.152