Time for action – killing the JobTracker

We'll first kill the JobTracker process which we should expect to impact our ability to execute MapReduce jobs but not affect the underlying HDFS filesystem.

  1. Log on to the JobTracker host and kill its process.
  2. Attempt to start a test MapReduce job such as Pi or WordCount:
    $ Hadoop jar wc.jar WordCount3 test.txt output
    Starting Job
    11/12/11 16:03:29 INFO ipc.Client: Retrying connect to server: /10.0.0.100:9001. Already tried 0 time(s).
    11/12/11 16:03:30 INFO ipc.Client: Retrying connect to server: /10.0.0.100:9001. Already tried 1 time(s).
    
    11/12/11 16:03:38 INFO ipc.Client: Retrying connect to server: /10.0.0.100:9001. Already tried 9 time(s).
    java.net.ConnectException: Call to /10.0.0.100:9001 failed on connection exception: java.net.ConnectException: Connection refused
      at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
      at org.apache.hadoop.ipc.Client.call(Client.java:743)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    
    
  3. Perform some HDFS operations:
    $ hadoop fs -ls /
    Found 2 items
    drwxr-xr-x   - hadoop supergroup          0 2011-12-11 19:19 /user
    drwxr-xr-x   - hadoop supergroup          0 2011-12-04 20:38 /var
    $ hadoop fs -cat test.txt
    This is a test file
    

What just happened?

After killing the JobTracker process we attempted to launch a MapReduce job. From the walk-through in Chapter 2, Getting Hadoop Up and Running, we know that the client on the machine where we are starting the job attempts to communicate with the JobTracker process to initiate the job scheduling activities. But in this case there was no running JobTracker, this communication did not happen and the job failed.

We then performed a few HDFS operations to highlight the point in the previous section; a non-functional MapReduce cluster will not directly impact HDFS, which will still be available to all clients and operations.

Starting a replacement JobTracker

The recovery of the MapReduce cluster is also pretty straightforward. Once the JobTracker process is restarted, all the subsequent MapReduce jobs are successfully processed.

Note that when the JobTracker was killed, any jobs that were in flight are lost and need to be restarted. Watch out for temporary files and directories on HDFS; many MapReduce jobs write temporary data to HDFS that is usually cleaned up on job completion. Failed jobs—especially the ones failed due to a JobTracker failure—are likely to leave such data behind and this may require a manual clean-up.

Have a go hero – moving the JobTracker to a new host

But what happens if the host on which the JobTracker process was running has a fatal hardware failure and cannot be recovered? In such situations you will need to start a new JobTracker process on a different host. This requires all nodes to have their mapred-site.xml file updated with the new location and the cluster restarted. Try this! We'll talk about it more in the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.27.74