Stages

The Stages tab in Spark UI shows the current status of all stages of all jobs in a Spark application, including two optional pages for the tasks and statistics for a stage and pool details. Note that this information is available only when the application works in a fair scheduling mode. You should be able to access the Stages tab at http://localhost:4040/stages. Note that when there are no jobs submitted, the tab shows nothing but the title. The Stages tab shows the stages in a Spark application. The following stages can be seen in this tab:

  • Active Stages
  • Pending Stages
  • Completed Stages

For example, when you submit a Spark job locally, you should be able to see the following status:

Figure 7: The stages for all jobs in the Spark web UI

In this case, there's only one stage that is an active stage. However, in the upcoming chapters, we will be able to observe other stages when we will submit our Spark jobs to AWS EC2 clusters.

To further dig down to the summary of the completed jobs, click on any link contained in the Description column and you should find the related statistics on execution time as metrics. An approximate time of min, median, 25th percentile, 75th percentile, and max for the metrics can also be seen in the following figure:

Figure 8: The summary for completed jobs on the Spark web UI

Your case might be different as I have executed and submitted only two jobs for demonstration purposes during the writing of this book. You can see other statistics on the executors as well. For my case, I submitted these jobs in the standalone mode by utilizing 8 cores and 32 GB of RAM. In addition to these, information related to the executor, such as ID, IP address with the associated port number, task completion time, number of tasks (including number of failed tasks, killed tasks, and succeeded tasks), and input size of the dataset per records are shown.

The other section in the image shows other information related to these two tasks, for example, index, ID, attempts, status, locality level, host information, launch time, duration, Garbage Collection (GC) time, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.205.169