Useful cluster level configurations (Spark standalone)

In this section, we will discuss some useful cluster level configurations in Spark standalone. These configurations can be set in spark-env.sh in $Spark_HOME/confdir. Any change in these configurations requires a restart of the worker JVM or the cluster. Here are some useful configurations:

SPARK_LOCAL_DIRS: This parameter specifies the comma separated list of local directories on a node of each node of the cluster that will be used for Spark shuffle operations and RDD persistence on disk
SPARK_MASTER_HOST: This parameter is used to bind Spark master to an IP or Hostname
SPARK_MASTER_PORT: This parameter is used to Spark master to a port on the system. The default value for this parameter is 7077
SPARK_WORKER_CORES: This parameter is used to specify total number of cores that a worker can provide to executors process running on that node
SPARK_WORKER_MEMORY: This parameter is used to specify total amount of memory that a worker can provide to executors process running on that node

Table of Contents for Useful cluster level configurations (Spark standalone)

Create new playlist

Sign In

Sign Up

Table of Contents for
Useful cluster level configurations (Spark standalone)