Take a look in the conf
directory within the Hadoop distribution. There are many configuration files, but the ones we need to modify are core-site.xml
, hdfs-site.xml
and mapred-site.xml
.
core-site.xml
to look like the following code:<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
hdfs-site.xml
to look like the following code:<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
mapred-site.xml
to look like the following code:<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
The first thing to note is the general format of these configuration files. They are obviously XML and contain multiple property specifications within a single configuration element.
The property specifications always contain name and value elements with the possibility for optional comments not shown in the preceding code.
We set three configuration variables here:
dfs.default.name
variable holds the location of the NameNode and is required by both HDFS and MapReduce components, which explains why it's in core-site.xml
and not hdfs-site.xml
.dfs.replication
variable specifies how many times each HDFS block should be replicated. Recall from Chapter 1, What It's All About, that HDFS handles failures by ensuring each block of filesystem data is replicated to a number of different hosts, usually 3. As we only have a single host and one DataNode in the pseudo-distributed mode, we change this value to 1
.mapred.job.tracker
variable holds the location of the JobTracker just like dfs.default.name
holds the location of the NameNode. Because only MapReduce components need know this location, it is in mapred-site.xml
.The network addresses for the NameNode and the JobTracker specify the ports on which the actual system requests should be directed. These are not user-facing locations, so don't bother pointing your web browser at them. There are web interfaces that we will look at shortly.
3.15.3.167