Step 2 - Installing and verifying Hadoop

  1. As mentioned before we would be using open source Apache Hadoop distribution for all the samples in this book, specifically version 2.7.3.
  2. Download the Hadoop Distribution from the following location in ${DOWNLOAD_DIR} using wget command:
wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
  1. Once the download is complete, using the shell change into any user directory, let us refer it as ${HADOOP_HOME}, where we want to install,  and run the following command to extract the contents:
tar -xzvf ${DOWNLOAD_DIR}/ hadoop-2.7.3.tar.gz
  1. Also, please disable the firewall from within the shell using the following command and reboot the VM:
systemctl stop firewalld
systemctl disable firewalld
  1. We will be setting up Hadoop in pseudo distributed mode for the examples, while in production Hadoop is setup in distributed mode (in Chapter 09Data Store using Apache Hadoop we are covering Hadoop deployment options in a bit more detail). Change into the ${HADOOP_HOME} directory and perform the changes as mentioned here: https://goo.gl/x6TC9q
  1. In case of HOST KEY VERIFICATION FAILURE, while setting up Hadoop in Pseudo distributed mode, please run the following commands in the shell:
ssh-keyscan -H -t rsa localhost  >> ~/.ssh/known_hosts
ssh-keyscan -H -t rsa 0.0.0.0 >> ~/.ssh/known_hosts
ssh-keyscan -H -t rsa 127.0.0.1 >> ~/.ssh/known_hosts
  1. Once the DFS is up and running, the installation completion can be confirmed by navigating to the following URL: http://localhost:50070.
  2. This pseudo-distribution setup is a reasonably complete setup that will be required for running various examples covered in this book.
  3. As of now, we may just run DFS, however we will delve into other services like YARN, Hive, Hbase, and so on in later chapters in this book. For running Sqoop examples we just need dfs running.
  4. Please set the environment variable HADOOP_HOME to point to the Hadoop installation directory, using the export command in ~/.bashrc. Also configure the PATH environment variable:
export HADOOP_HOME=<Hadoop-Installation-Directory>
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.196.146