Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Tuning MapReduce job parameters

The Hadoop framework is very flexible and can be tuned using a number of configuration parameters. In this recipe, we will discuss the function and purpose of different configuration parameters you can set for a MapReduce job.

Getting ready

Ensure that you have a MapReduce job which has a job class that extends the Hadoop Configuration class and implements the Hadoop Tool interface, such as any MapReduce application we have written so far in this book.

How to do it...

Follow these steps to customize MapReduce job parameters:

Ensure you have a MapReduce job class which extends the Hadoop Configuration class and the Tool interface.

Use the ToolRunner.run() static method to run your MapReduce job, as shown in the following example:

public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new MyMapReduceJob(), args);
        System.exit(exitCode);
}

Examine the following table of Hadoop job properties and values:

Property name	Possible values	Description
`mapred.reduce.tasks`	Integers (0 - N)	Sets the number of reducers to launch.
`mapred.child.java.opts`	JVM key-value pairs	These parameters are given as arguments to every task JVM. For example, to set the maximum heap size for all tasks to 1 GB, you would set this property to '-Xmx1GB'.
`mapred.map.child.java.opts`	JVM key-value pairs	These parameters are given as arguments to every map task JVM.
`mapred.reduce.child.java.opts`	JVM key-value pairs	These parameters are given as arguments to every reduce task JVM.
`mapred.map.tasks.speculative.execution`	Boolean (true/false)	Tells the Hadoop framework to speculatively launch the exact same map task on different nodes in the cluster if a task is not performing well as compared to other tasks in the job. This property was discussed in Chapter 1, Hadoop Distributed File System – Importing and Exporting Data.
`mapred.reduce.tasks.speculative.execution`	Boolean (true/false)	Tells the Hadoop framework to speculatively launch the exact same reduce task on different nodes in the cluster if a task is not performing well as compared to other tasks in the job.
`mapred.job.reuse.jvm.num.tasks`	Integer (-1, 1 – N)	The number of task JVMs to be re-used. A value of 1 indicates one JVM will be started per task, a value of -1 indicates a single JVM can run an unlimited number of tasks. Setting this parameter might help increase the performance of small jobs because JVMs will be re-used for multiple tasks (as opposed to starting a JVM for each and every task).
`mapred.compress.map.output` `mapred.output.compression.type` `mapred.map.output.compression.codec`	Boolean (true/false) String (NONE, RECORD, or BLOCK) String (Name of compression codec class)	These three parameters are used to compress the output of map tasks.
`mapred.output.compress` `mapred.output.compression.type` `mapred.output.compression.codec`	Boolean (true/false) String (NONE, RECORD, or BLOCK) String (Name of compression codec class)	These three parameters are used to compress the output of a MapReduce job.

Execute a MapReduce job with a custom Hadoop property. For example, we will launch a job using five reducers:
```
$ cd /path/to/hadoop
$ bin/hadoop –jar MyJar.jar com.packt.MyJobClass –Dmapred.reduce.tasks=5
```

How it works...

When a job class extends the Hadoop Configuration class and implements the Hadoop Tool interface, the ToolRunner class will automatically handle the following generic Hadoop arguments:

Argument/Flag	Purpose
`-conf`	Takes a path to a parameter configuration file.
`-D`	Used to specify Hadoop key/value properties which will be added to the job configuration
`-fs`	Used to specify the host port of the NameNode
`-jt`	Used to specify the host port of the JobTracker

In the case of this recipe, the ToolRunner class will automatically place all of the parameters specified with the -D flag into the job configuration XML file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Tuning MapReduce job parameters

Create new playlist

Sign In

Sign Up

Tuning MapReduce job parameters

Getting ready

How to do it...

How it works...

Table of Contents for
Tuning MapReduce job parameters