Updating task status messages to display debugging information

Along with maintaining counters, another role of the Reporter class in Hadoop is to capture task status information. The task status information is periodically sent to the Job Tracker. The Job Tracker UI is updated to reflect the current status. By default, the task status will display its state. The task state can be one of the following:

  • RUNNING
  • SUCCEEDED
  • FAILED
  • UNASSIGNED
  • KILLED
  • COMMIT_PENDING
  • FAILED_UNCLEAN
  • KILLED_UNCLEAN

When debugging a MapReduce job, it can be useful to display a custom message that gives more detailed information on how the task is running. This recipe shows how to update the task status.

Getting ready

  • Download the source code for this chapter.
  • Load the StatusMessage project.

How to do it...

Updating a task's status message can be done using the setStatus() method of the job's Context class.

context.setMessage("user custom message");

How it works...

The source code for this chapter provides an example of using a custom task status message to display the number of rows being processed per second by the task.

public static class StatusMap extends Mapper<LongWritable, Text, LongWritable, Text> {

      private int rowCount = 0;
      private long startTime = 0;

      public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{

         //Display rows per second every 100,000 rows
         rowCount++;
         if(startTime == 0 || rowCount % 100000 == 0)
         {
            if(startTime > 0)
{
               long estimatedTime = System.nanoTime() - startTime;
               context.setStatus("Processing: " + (double)rowCount / ((double)estimatedTime/1000000000.0) + " rows/second");
               rowCount = 0;
}

            startTime = System.nanoTime();
}

         context.write(key, value);
}
}

Two private class variables are declared: rowCount for keeping track of the number of rows that are processed and startTime for keeping track of the time when processing started. Once the map function has processed 100,000 lines, the task status is updated with the number of rows per second that are being processed.

context.setStatus("Processing: " + (double)rowCount / ((double)estimatedTime/1000000000.0) + " rows/second"); 

After the message has been updated, the rowCount and startTime variables are reset and the process starts over again. The status is stored locally in the memory of the current process. It is then sent to the Task Tracker. The next time the Task Tracker pings, the Job Tracker is also sent the updated status message. Once the Job Tracker receives the status message, this information is made available to the UI.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.22.160