Hive installation

  1. Download the latest stable release of Hive from the following location, using the command:
wget http://www-us.apache.org/dist/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz
  1. Change to a user directory and extract the contents of the tar using the following command:
tar -xzvf ${DOWNLOAD_DIR}/apache-hive-2.1.1-bin.tar.gz
  1. Configure and export the environment variable ${HIVE_HOME} pointing to the extracted directory and append its binaries to the path. Append the same to ~/.bashrc file
export HIVE_HOME=<Hive directory>
export PATH=$PATH:$HIVE_HOME/bin
  1. Install latest SASL (Simple Authentication and Security Layer) packages for your operating system as Hive has dependency on this. For CentOS, these can be installed with following command:
sudo yum install *sasl*
  1. Hive provides services which needs a metadata store for managing metadata information. Let us configure PostgreSQL server as the metadata server using the following commands using psql (interactive terminal for working with PostgreSQL) client in shell/Command Prompt.:
    • Create postgresql user as hiveuser and the database as metastore with the following commands:
sudo -u postgres psql

The previous command initializes and starts the psql client (the shell would show postgres=#) for running queries. Now let us create the user and database for Hive metastore with following queries:

 postgres=# CREATE USER hiveuser WITH PASSWORD 'mypassword';
postgres=# CREATE DATABASE metastore;
  • Configure permissions on the metastore for hiveuser with the following commands:
postgres=# c metastore
metastore=# pset tuples_only on
metastore=# o /tmp/grant-privs
metastore=# SELECT 'GRANT SELECT,INSERT,UPDATE,DELETE ON "' || schemaname || '". "' ||tablename ||'" TO hiveuser ;'
metastore-# FROM pg_tables
metastore-# WHERE tableowner = CURRENT_USER and schemaname = 'public';
metastore=# o
metastore=# pset tuples_only off
metastore=# i /tmp/grant-privs
  • Copy the ${HIVE_HOME}/conf/hive-default.xml.template to ${HIVE_HOME}/conf/hive-site.xml file with following command:
cp ${HIVE_HOME}/conf/hive-default.xml.template ${HIVE_HOME}/conf/hive-site.xml
  • Configure the following properties in ${HIVE_HOME}/conf/hive-site.xml:
HIVE Property
Suggested Value
hive.exec.scratchdir /tmp/hive
hive.exec.local.scratchdir

/tmp/hive/centos

Here, centos is the user account under which the hive queries will be executed

hive.downloaded.resources.dir /tmp/hive/${hive.session.id}_resources
javax.jdo.option.ConnectionPassword

hivepass

The password created in PSQL client.

javax.jdo.option.ConnectionURL

jdbc:postgresql://<POSTGRESQL_SERVER_IP:PORT>/

metastore

javax.jdo.option.ConnectionDriverName org.postgresql.Driver
javax.jdo.option.ConnectionUserName hiveuser
hive.server2.enable.doAs false
    • Copy the PostgreSQL driver JAR in Hive lib directory, i.e. in ${HIVE_HOME}/lib, in the same way as was done for Sqoop setup. The PostgreSQL driver can be downloaded from the following location with the command. As part of Sqoop setup this should be already existing in the download folder.
    wget https://jdbc.postgresql.org/download/postgresql-9.4.1212.jre6.jar
    
    1. Configure the ${HADOOP_HOME}/etc/hadoop/core-site.xml with the following entries. Change hadoop.proxyuser.centos to your user account, instead of centos, in the following sample configurations:
    <property>
    <name>hadoop.proxyuser.centos.hosts</name>
    <value>*</value>
    </property>
    <property>
    <name>hadoop.proxyuser.centos.groups</name>
    <value>*</value>
    </property>
    1. Restart the DFS service using the following commands:
    stop-dfs.sh
    start-dfs.sh
    1. Configure ${HUE_HOME}/desktop/conf/hue.ini to make Hue work with the Hive service with following properties. Search for this property in hue.ini and change the CentOS to your user account name:
    hive_conf_dir=/home/centos/apache-hive-2.1.1-bin/conf
    
    1. Restart the hue service by gracefully stopping (find the supervisor process and then kill it) and starting it with following command:
    ${HUE_HOME}/build/env/bin/supervisor -d
    
    1. Use the schematool to generate schema with the following command:
    ${HIVE_HOME}/bin/schematool -dbType postgres -initSchema --verbose

    Go to pgAdmin and you should see a new database namely metastore containing the various tables generated by the schematool

    1. Verify the install by following steps:
      1. Check for hive shell by running the given command:
    ${HIVE_HOME}/bin/hive
    

    If things go well you should see the hive shell. You can run various Hive queries using this shell if needed.

    1. Launch the hiveserver2 service with the following command. Hiveserver2 is the remoting process which enables Hue integration with Hive and enable Hue to run queries in Hue UI.:
    ${HIVE_HOME}/bin/hive --service hiveserver2 -hiveconf hive.root.logging=console
    
    1. After successful start of hiveserver2, open Hue and navigate to Query Editor|HIVE, which should open without any errors being reported.

    Now that we have all the required components installed and working, we will look at a few examples. The initial example would cover loading aspect of the data and as we proceed through these examples we will also see the processing aspects of Hadoop layer.

    ..................Content has been hidden....................

    You can't read the all page of ebook, please click here login for view all page.
    Reset
    13.59.48.161