Chapter 8. Debugging

In this chapter, we will cover:

  • Using Counters in a MapReduce job to track bad records
  • Developing and testing MapReduce jobs with MRUnit
  • Developing and testing MapReduce jobs running in local mode
  • Enabling MapReduce jobs to skip bad records
  • Using Counters in a streaming job
  • Updating task status messages to display debugging information
  • Using illustrate to debug Pig jobs

Introduction

There is an adage among those working with Hadoop that everything breaks at scale. Malformed or unexpected input is common. It's an unfortunate downside of working with large amounts of unstructured data. Within the context of Hadoop, individual tasks are isolated and given different sets of input. This allows Hadoop to easily distribute jobs, but leads to difficulty in tracking global events and understanding the state of each individual task. Fortunately, there are several tools and techniques available to aid in the process of debugging Hadoop jobs. This chapter will focus on applying these tools and techniques to debug MapReduce jobs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.57.172