We have seen core-site.xml and hdfs-site.xml files in previous files. To configure MapReduce, primarily Hadoop provides mapred-site.xml. In addition to mapred-site.xml, Hadoop also provides a default read-only configuration for references called mapred-default.xml. The location of mapred-site.xml can be found in the $HADOOP_HOM/etc/Hadoop directory. Now, let's look at all of the other important parameters that are needed for MapReduce to run without any hurdles:
Property Name |
Default Value |
Description |
mapreduce.cluster.local.dir |
${Hadoop.tmp.dir}/mapred/local |
A local directory for keeping all MapReduce-related intermediate data. You need to ensure that you have sufficient space. |
mapreduce.framework.name |
Local |
local: This is to run MR jobs. classic: This is to run MR jobs in cluster as well as pseudo-distributed mode (MRv1). yarn: This is to run MR jobs as YARN (MRv2). |
mapreduce.map.memory.mb |
1024 |
The memory to be requested for each map task from the scheduler. For large jobs that require intensive processing in the Map phase, set this number high. |
mapreduce.map.java.opts |
None |
You can specify Xmx, verbose, and gc strategy through this parameter, which can take place during Map task execution. |
mapreduce.reduce.memory.mb |
1024 |
The memory to be requested for each map task from the scheduler. For large jobs that require intensive processing in the Reduce phase, set this number high. |
mapreduce.reduce.java.opts |
No Defaults
|
You can specify Xmx, verbose, and gc strategy through this parameter, which can take place during Reduce task execution. |
mapreduce.jobhistory.address |
0.0.0.0:10020 |
This is for Job history server and IPC port. |
mapreduce.jobhistory.webapp.address |
0.0.0.0:19888 |
This is again for Job history server but to host its web application. Once this is set, you will be able to access the Job history server UI at 19888. |
You will find list of all the different configuration properties for mapred-site.xml here.