Executor program

Executors can be considered as slave JVMs for Driver processes. These are the JVM processes where actual application logic executes. An application in Spark consists of a number of a tasks where a task is the smallest unit of work to be executed, for example, reading all blocks of a file from HDFS, running map operation of RDD partitions, and so on.

In the traditional MapReduce of Hadoop, multiple mappers and reducers are executed for a chain of MR jobs and every mapper and reducer runs as a separate JVM. So, there is a lot of overhead of initiating JVMs of each mapper and reducer tasks. However, in the case of Spark, executors are initiated once at the beginning of the application execution and they run until the application finishes. All the application logic runs inside executors, being individual tasks, as separate threads. The following are the main responsibilities of executors:

  1. Execute the tasks as directed by the Driver.
  2. Cache RDD partitions in memory on which the tasks execute.
  3. Report the tasks result to the driver.

Executors are not a single point of failure. If an executor dies, it is the responsibility of the cluster manager to restart it. We will discuss cluster manager operations in the next section.

So far in this book, we were running Spark applications in local mode.

In local mode, Spark Driver and Executor both run inside the single JVM process.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.220.83