Debugging Spark applications using logs

Seeing the information about all running Spark applications depends on which cluster manager you are using. You should follow these instructions while debugging your Spark application:

  • Spark Standalone: Go to the Spark master UI at http://master:18080. The master and each worker show cluster and the related job statistics. In addition, a detailed log output for each job is also written to the working directory of each worker. We will discuss how to enable the logging manually using the log4j with Spark.
  • YARN: If your cluster manager is YARN, and suppose that you are running your Spark jobs on the Cloudera (or any other YARN-based platform), then go to the YARN applications page in the Cloudera Manager Admin Console. Now, to debug Spark applications running on YARN, view the logs for the Node Manager role. To make this happen, open the log event viewer and then filter the event stream to choose a time window and log level and to display the Node Manager source. You can access logs through the command as well. The format of the command is as follows:
     yarn logs -applicationId <application ID> [OPTIONS]

For example, the following are the valid commands for these IDs:

     yarn logs -applicationId application_561453090098_0005                               
yarn logs -applicationId application_561453090070_0005 userid

Note that the user IDs are different. However, this is only true if yarn.log-aggregation-enable is true in yarn-site.xml and the application has already finished the execution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.22.162