Hive installation

Download the latest stable release of Hive from the following location, using the command:

wget http://www-us.apache.org/dist/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz

Change to a user directory and extract the contents of the tar using the following command:

tar -xzvf ${DOWNLOAD_DIR}/apache-hive-2.1.1-bin.tar.gz

Configure and export the environment variable ${HIVE_HOME} pointing to the extracted directory and append its binaries to the path. Append the same to ~/.bashrc file

export HIVE_HOME=<Hive directory>
export PATH=$PATH:$HIVE_HOME/bin

Install latest SASL (Simple Authentication and Security Layer) packages for your operating system as Hive has dependency on this. For CentOS, these can be installed with following command:

sudo yum install *sasl*

Hive provides services which needs a metadata store for managing metadata information. Let us configure PostgreSQL server as the metadata server using the following commands using psql (interactive terminal for working with PostgreSQL) client in shell/Command Prompt.:
- Create postgresql user as hiveuser and the database as metastore with the following commands:

sudo -u postgres psql

The previous command initializes and starts the psql client (the shell would show postgres=#) for running queries. Now let us create the user and database for Hive metastore with following queries:

 postgres=# CREATE USER hiveuser WITH PASSWORD 'mypassword';
 postgres=# CREATE DATABASE metastore;

Configure permissions on the metastore for hiveuser with the following commands:

postgres=# c metastore
metastore=# pset tuples_only on
metastore=# o /tmp/grant-privs
metastore=#   SELECT 'GRANT SELECT,INSERT,UPDATE,DELETE ON "'  || schemaname || '". "' ||tablename ||'" TO hiveuser ;'
metastore-#   FROM pg_tables
metastore-#   WHERE tableowner = CURRENT_USER and schemaname = 'public';
metastore=# o
metastore=# pset tuples_only off
metastore=# i /tmp/grant-privs

Copy the ${HIVE_HOME}/conf/hive-default.xml.template to ${HIVE_HOME}/conf/hive-site.xml file with following command:

cp ${HIVE_HOME}/conf/hive-default.xml.template ${HIVE_HOME}/conf/hive-site.xml

Configure the following properties in ${HIVE_HOME}/conf/hive-site.xml:

HIVE Property	Suggested Value
`hive.exec.scratchdir`	`/tmp/hive`
`hive.exec.local.scratchdir`	`/tmp/hive/centos` Here, `centos` is the user account under which the hive queries will be executed
`hive.downloaded.resources.dir`	`/tmp/hive/${hive.session.id}_resources`
`javax.jdo.option.ConnectionPassword`	`hivepass` The password created in PSQL client.
`javax.jdo.option.ConnectionURL`	`jdbc:postgresql://<POSTGRESQL_SERVER_IP:PORT>/` `metastore`
`javax.jdo.option.ConnectionDriverName`	`org.postgresql.Driver`
`javax.jdo.option.ConnectionUserName`	`hiveuser`
`hive.server2.enable.doAs`	`false`

Copy the PostgreSQL driver JAR in Hive lib directory, i.e. in ${HIVE_HOME}/lib, in the same way as was done for Sqoop setup. The PostgreSQL driver can be downloaded from the following location with the command. As part of Sqoop setup this should be already existing in the download folder.

wget https://jdbc.postgresql.org/download/postgresql-9.4.1212.jre6.jar

Configure the ${HADOOP_HOME}/etc/hadoop/core-site.xml with the following entries. Change hadoop.proxyuser.centos to your user account, instead of centos, in the following sample configurations:

<property>
   <name>hadoop.proxyuser.centos.hosts</name>
   <value>*</value>
</property>
<property>
   <name>hadoop.proxyuser.centos.groups</name>
   <value>*</value>
</property>

Restart the DFS service using the following commands:

stop-dfs.sh
start-dfs.sh

Configure ${HUE_HOME}/desktop/conf/hue.ini to make Hue work with the Hive service with following properties. Search for this property in hue.ini and change the CentOS to your user account name:

hive_conf_dir=/home/centos/apache-hive-2.1.1-bin/conf

Restart the hue service by gracefully stopping (find the supervisor process and then kill it) and starting it with following command:

${HUE_HOME}/build/env/bin/supervisor -d

Use the schematool to generate schema with the following command:

${HIVE_HOME}/bin/schematool -dbType postgres -initSchema --verbose

Go to pgAdmin and you should see a new database namely metastore containing the various tables generated by the schematool

Verify the install by following steps:
1. Check for hive shell by running the given command:

${HIVE_HOME}/bin/hive

If things go well you should see the hive shell. You can run various Hive queries using this shell if needed.

Launch the hiveserver2 service with the following command. Hiveserver2 is the remoting process which enables Hue integration with Hive and enable Hue to run queries in Hue UI.:

${HIVE_HOME}/bin/hive --service hiveserver2 -hiveconf hive.root.logging=console

After successful start of hiveserver2, open Hue and navigate to Query Editor|HIVE, which should open without any errors being reported.

Now that we have all the required components installed and working, we will look at a few examples. The initial example would cover loading aspect of the data and as we proceed through these examples we will also see the processing aspects of Hadoop layer.

Table of Contents for Hive installation

Create new playlist

Sign In

Sign Up

Table of Contents for
Hive installation