Spark web interface

The web UI (also known as Spark UI) is the web interface for running Spark applications to monitor the execution of jobs on a web browser such as Firefox or Google Chrome. When a SparkContext launches, a web UI that displays useful information about the application gets started on port 4040 in standalone mode. The Spark web UI is available in different ways depending on whether the application is still running or has finished its execution.

Also, you can use the web UI after the application has finished its execution by persisting all the events using EventLoggingListener. The EventLoggingListener, however, cannot work alone, and the incorporation of the Spark history server is required. Combining these two features, the following facilities can be achieved:

  • A list of scheduler stages and tasks
  • A summary of RDD sizes
  • Memory usage
  • Environmental information
  • Information about the running executors

You can access the UI at http://<driver-node>:4040 in a web browser. For example, a Spark job submitted and running as a standalone mode can be accessed at http://localhost:4040.

Note that if multiple SparkContexts are running on the same host, they will bind to successive ports beginning with 4040, 4041, 4042, and so on. By default, this information will be available for the duration of your Spark application only. This means that when your Spark job finishes its execution, the binding will no longer be valid or accessible.

As long as the job is running, stages can be observed on Spark UI. However, to view the web UI after the job has finished the execution, you could try setting spark.eventLog.enabled as true before submitting your Spark jobs. This forces Spark to log all the events to be displayed in the UI that are already persisted on storage such as local filesystem or HDFS.

In the previous chapter, we saw how to submit a Spark job to a cluster. Let's reuse one of the commands for submitting the k-means clustering, as follows:

 

# Run application as standalone mode on 8 cores
SPARK_HOME/bin/spark-submit
--class org.apache.spark.examples.KMeansDemo
--master local[8]
KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar
Saratoga_NY_Homes.txt

If you submit the job using the preceding command, you will not be able to see the status of the jobs that have finished the execution, so to make the changes permanent, use the following two options:

spark.eventLog.enabled=true 
spark.eventLog.dir=file:///home/username/log"

By setting the preceding two configuration variables, we asked the Spark driver to make the event logging enabled to be saved at file:///home/username/log.

In summary, with the following changes, your submitting command will be as follows:

# Run application as standalone mode on 8 cores
SPARK_HOME/bin/spark-submit
--conf "spark.eventLog.enabled=true"
--conf "spark.eventLog.dir=file:///tmp/test"
--class org.apache.spark.examples.KMeansDemo
--master local[8]
KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar
Saratoga_NY_Homes.txt
Figure 1: Spark web UI

As shown in the preceding screenshot, Spark web UI provides the following tabs:

  • Jobs
  • Stages
  • Storage
  • Environment
  • Executors
  • SQL

It is to be noted that all the features may not be visible at once as they are lazily created on demand, for example, while running a streaming job.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.50.206