Installing Hadoop on Windows

We can install Hadoop on Windows in order to evaluate its power, before migrating and configuring full-sized production cluster on a Linux machine. Bear in mind that this configuration is suitable for evaluation or testing purposes only. For a full-fledged production cluster, we need to have a Linux distribution for cluster setup.

  1. Download and install Cygwin from http://www.cygwin.com.

    This is a tool that provides native Linux programs to run on Windows.

  2. Download Hadoop, HBase, and Zookeeper. It's better not to download the most recent version as Cygwin does not have full-fledged support. Cluster using Cygwin will just give us the feeling of a cluster when we don't want to directly switch to the Linux OS and first evaluate it on Windows.
  3. Copy the downloaded Hadoop file to c:cygwinusrlocal, which is the default location of Cygwin when installed.
  4. Open Cygwin and extract Hadoop, HBase, and ZooKeeper TAR, which we copied inside the Cygwin folder. We will find it at the /usr/local location. After extracting, make the following changes to these particular files:

    In the core-site.xml file, make the following changes:

    <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9100</value>
      <description>the value can be either localhost or 127.0.0.1 </description>
    </property>

    In the mapred-site.xml, make the following changes:

    <property>
      <name>mapred.job.tracker</name>
      <value>localhost:9101</value>
      <description>the value can be either localhost or 127.0.0.1 </description>
    </property>

    In the hdfs-site.xml file, make the following changes:

    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
    <property>
      <name>dfs.permissions</name>
      <value>false</value>
    </property>
  5. After this, we just format NameNode and start the process; we also install HBase.

    Extract the HBase TAR file. Create a symbolic link to JRE, which must be present in Windows, using the following command:

    ln -s /cygdrive/c/Program Files/Java/<jre present in system> /usr/local/Java/<jre present in system >
    

    The preceding command will create a soft link to JRE and make JRE available to Cygwin for Hadoop and HBase.

  6. Change hbase-env.sh; add the following lines:
    export JAVA_HOME=/usr/local/Java/<jre name given>
    export HBASE_IDENT_STRING=$HOSTNAME
    export HBASE_MANAGES_ZK=true
  7. Then, change the hbase-site.xml file and add the following lines:
    <property>
      <name>hbase.rootdir</name>
      <value>file:///C:/cygwin/tmp/hbaseroot</value>
    </property>
    
    <property>
      <name>hbase.tmp.dir</name>
      <value>C:/cygwin/tmp/hbaseroot/temp</value>
    </property>

Now, we can start HBase using the following command:

bin/start-hbase.sh

Alternatively, you can use the following command:

./start-hbase.sh

Note

Bear in mind that this will not be suitable for production or serious code testing. If we need a real Hadoop cluster, we must have a Linux/Mac/Oracle OS or similar.

Follow the steps given on http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster) to install and run Hadoop on OS X.

Another way to get Ubuntu on Windows is to install the OS in a virtual machine and configure Hadoop/HBase. Use the following links to do so.

After installation and configuration, we can perform a file-system-related operation using Hadoop HDFS binary and HBase shell, which we will see in detail in the next chapter.

Tip

For configuration files, it is always better to have a separated directory on a common mount point to all nodes, and a soft link in either an HBase or a Hadoop directory pointing to it. Processes can be started using the --config keyword on command lines while starting and stopping the processes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.210.102