Deploying applications on Spark standalone cluster

In the previous section, we learned how to start and stop Spark standalone cluster. Once the Spark standalone cluster is running, Spark applications can be deployed on the cluster. Before deploying the application on the cluster, let's first discuss how applications components run in Spark Standalone mode:

Logical Representation of Spark Application on Standalone Cluster

The preceding figure shows a Spark standalone cluster of three nodes. Spark master is running on one node and workers on two other nodes. A Spark application is deployed on the cluster. Driver JVMs is running on one of the worker nodes. The application consists of two Executor JVMs running on each worker node. Executors JVMs show an RDD with two partitions cached in the memory and in each Executor four tasks are running, which execute the actual business logic of the application.

In Spark standalone mode, a maximum of one executor per application can be executed per worker node.

When the client submits Spark Application requests to Spark Master, Spark driver starts in the client program or a separate JVM of the worker nodes (depends upon the deploy mode of the application). After the Driver of the application is started, it contacts spark master with the required cores for the application. Then, Spark master will start executors for the applications that register with the driver. After which, the Driver submits the physical plan of execution to executors to process. Spark master generally starts one executor each on every worker node to load balance the application processing.

Spark distribution provides a Binary spark-submit, which is used to deploy Spark applications on every cluster manager.

Spark application on standalone cluster can be deployed in the following two modes:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.75.165