Time for action – installing Hive

Let's now set up Hive so we can start using it in action.

  1. Download the latest stable version of Hive and move it to the location to which you wish to have it installed:
    $ mv hive-0.8.1.tar.gz /usr/local
    
  2. Uncompress the package:
    $ tar –xzf hive-0.8.1.tar.gz
    
  3. Set the HIVE_HOME variable to the installation directory:
    $ export HIVE_HOME=/usr/local/hive
    
  4. Add the Hive home directory to the path variable:
    $ export PATH=${HIVE_HOME}/bin:${PATH}
    
  5. Create directories required by Hive on HDFS:
    $ hadoop fs -mkdir /tmp
    $ hadoop fs -mkdir /user/hive/warehouse
    
  6. Make both of these directories group writeable:
    $ hadoop fs -chmod g+w /tmp
    $ hadoop fs -chmod g+w /user/hive/warehouse
    
  7. Try to start Hive:
    $ hive
    

    You will receive the following response:

    Logging initialized using configuration in jar:file:/opt/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
    Hive history file=/tmp/hadoop/hive_job_log_hadoop_201203031500_480385673.txt
    hive>
    
  8. Exit the Hive interactive shell:
    $ hive> quit;
    

What just happened?

After downloading the latest stable Hive release, we copied it to the desired location and uncompressed the archive file. This created a directory, hive-<version>.

Similarly, as we previously defined HADOOP_HOME and added the bin directory within the installation to the path variable, we then did something similar with HIVE_HOME and its bin directory.

Note

Remember that to avoid having to set these variables every time you log in, add them to your shell login script or to a separate configuration script that you source when you want to use Hive.

We then created two directories on HDFS that Hive requires and changed their attributes to make them group writeable. The /tmp directory is where Hive will, by default, write transient data created during query execution and will also place output data in this location. The /user/hive/warehouse directory is where Hive will store the data that is written into its tables.

After all this setup, we run the hive command and a successful installation will give output similar to the one mentioned above. Running the hive command with no arguments enters an interactive shell; the hive> prompt is analogous to the sql> or mysql> prompts familiar from relational database interactive tools.

We then exit the interactive shell by typing quit;. Note the trailing semicolon ;. HiveQL is, as mentioned, very similar to SQL and follows the convention that all commands must be terminated by a semicolon. Pressing Enter without a semicolon will allow commands to be continued on subsequent lines.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.198.94