Time for action – changing the base HDFS directory

Let's first set the base directory that specifies the location on the local filesystem under which Hadoop will keep all its data. Carry out the following steps:

  1. Create a directory into which Hadoop will store its data:
    $ mkdir /var/lib/hadoop
    
  2. Ensure the directory is writeable by any user:
    $ chmod 777 /var/lib/hadoop
    
  3. Modify core-site.xml once again to add the following property:
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/var/lib/hadoop</value>
    </property>

What just happened?

As we will be storing data in Hadoop and all the various components are running on our local host, this data will need to be stored on our local filesystem somewhere. Regardless of the mode, Hadoop by default uses the hadoop.tmp.dir property as the base directory under which all files and data are written.

MapReduce, for example, uses a /mapred directory under this base directory; HDFS uses /dfs. The danger is that the default value of hadoop.tmp.dir is /tmp and some Linux distributions delete the contents of /tmp on each reboot. So it's safer to explicitly state where the data is to be held.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.35.185