Useful cluster level configurations (Spark standalone)

In this section, we will discuss some useful cluster level configurations in Spark standalone. These configurations can be set in spark-env.sh in $Spark_HOME/confdir. Any change in these configurations requires a restart of the worker JVM or the cluster. Here are some useful configurations:

  • SPARK_LOCAL_DIRS: This parameter specifies the comma separated list of local directories on a node of each node of the cluster that will be used for Spark shuffle operations and RDD persistence on disk
  • SPARK_MASTER_HOST: This parameter is used to bind Spark master to an IP or Hostname
  • SPARK_MASTER_PORT: This parameter is used to Spark master to a port on the system. The default value for this parameter is 7077
  • SPARK_WORKER_CORES: This parameter is used to specify total number of cores that a worker can provide to executors process running on that node
  • SPARK_WORKER_MEMORY: This parameter is used to specify total amount of memory that a worker can provide to executors process running on that node
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.0.85