Hadoop YARN

As already discussed, the Apache Hadoop YARN has to main components: a scheduler and an applications manager, as shown in the following figure:

Figure 16: Apache Hadoop YARN architecture (blue: system components; yellow and pink: two applications running)

Now that using the scheduler and the applications manager, the following two deploy modes can be configured to launch your Spark jobs on a YARN-based cluster:

Cluster mode: In the cluster mode, the Spark driver works within the master process of an application managed by YARN's application manager. Even the client can be terminated or disconnected away when the application has been initiated.
Client mode: In this mode, the Spark driver runs inside the client process. After that, Spark master is used only for requesting computing resources for the computing nodes from YARN (YARN resource manager).

In the Spark standalone and Mesos modes, the URL of the master (that is, address) needs to be specified in the --master parameter. However, in the YARN mode, the address of the resource manager is read from the Hadoop configuration file in your Hadoop setting. Consequently, the --master parameter is yarn. Before submitting our Spark jobs, we, however, you need to set up your YARN cluster. The next subsection shows a step-by-step of doing so.

Table of Contents for Hadoop YARN

Create new playlist

Sign In

Sign Up

Table of Contents for
Hadoop YARN