Useful cluster level configurations (Spark standalone)

In this section, we will discuss some useful cluster level configurations in Spark standalone. These configurations can be set in in $Spark_HOME/confdir. Any change in these configurations requires a restart of the worker JVM or the cluster. Here are some useful configurations:

  • SPARK_LOCAL_DIRS: This parameter specifies the comma separated list of local directories on a node of each node of the cluster that will be used for Spark shuffle operations and RDD persistence on disk
  • SPARK_MASTER_HOST: This parameter is used to bind Spark master to an IP or Hostname
  • SPARK_MASTER_PORT: This parameter is used to Spark master to a port on the system. The default value for this parameter is 7077
  • SPARK_WORKER_CORES: This parameter is used to specify total number of cores that a worker can provide to executors process running on that node
  • SPARK_WORKER_MEMORY: This parameter is used to specify total amount of memory that a worker can provide to executors process running on that node
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.