In this section, we will discuss some useful cluster level configurations in Spark standalone. These configurations can be set in spark-env.sh in $Spark_HOME/confdir. Any change in these configurations requires a restart of the worker JVM or the cluster. Here are some useful configurations:
- SPARK_LOCAL_DIRS: This parameter specifies the comma separated list of local directories on a node of each node of the cluster that will be used for Spark shuffle operations and RDD persistence on disk
- SPARK_MASTER_HOST: This parameter is used to bind Spark master to an IP or Hostname
- SPARK_MASTER_PORT: This parameter is used to Spark master to a port on the system. The default value for this parameter is 7077
- SPARK_WORKER_CORES: This parameter is used to specify total number of cores that a worker can provide to executors process running on that node
- SPARK_WORKER_MEMORY: This parameter is used to specify total amount of memory that a worker can provide to executors process running on that node