Installing Hive from Apache

To introduce the Hive installation, we will use Hive version 2.3.3 as an example. The pre-installation requirements for this installation are as follows:

JDK 1.8
Hadoop 2.x.y
Ubuntu 16.04/CentOS 7

Since we focus on Hive in this book, the installation steps for Java and Hadoop are not provided here. For steps on installing them, please refer to https://www.java.com/en/download/help/download_options.xml and http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html.

The following steps describe how to install Apache Hive in the command-line environment:

Download Hive from Apache Hive and unpack it:

      $cd /opt
      $wget https://archive.apache.org/dist/hive/hive-2.3.3/apache-
      hive-2.3.3-bin.tar.gz
      $tar -zxvf apache-hive-2.3.3-bin.tar.gz
      $ln -sfn /opt/apache-hive-2.3.3 /opt/hive

Add the necessary system path variables in the ~/.profile or ~/.bashrc file:

      export HADOOP_HOME=/opt/hadoop
      export HADOOP_CONF_DIR=/opt/hadoop/conf
      export HIVE_HOME=/opt/hive
      export HIVE_CONF_DIR=/opt/hive/conf
      export PATH=$PATH:$HIVE_HOME/bin:$HADOOP_HOME/
      bin:$HADOOP_HOME/sbin

Enable the settings immediately:

      $source ~/.profile

Create the configuration files:

      $cd /opt/hive/conf
      $cp hive-default.xml.template hive-site.xml
      $cp hive-exec-log4j.properties.template hive-exec-
      log4j.properties
      $cp hive-log4j.properties.template hive-log4j.properties

Modify $HIVE_HOME/conf/hive-site.xml, which has some important parameters to set:

hive.metastore.warehourse.dir: This is the path to the Hive warehouse location. By default, it is at /user/hive/warehouse.
hive.exec.scratchdir: This is the temporary data file location. By default, it is at /tmp/hive-${user.name}.

By default, Hive uses the Derby (http://db.apache.org/derby/) database as the metadata store. It can also use other relational databases, such as Oracle, PostgreSQL, or MySQL, as the metastore. To configure the metastore on other databases, the following parameters should be configured in hive-site.xml:

javax.jdo.option.ConnectionURL: This is the JDBC URL database
javax.jdo.option.ConnectionDriverName: This is the JDBC driver class name
javax.jdo.option.ConnectionUserName: This is the username used to access the database
javax.jdo.option.ConnectionPassword: This is the password used to access the database

The following is a sample setting using MySQL as the metastore database:

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
  </value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
  <description>password to use against metastore database</description>
</property>
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://localhost:9083</value>
  <description>By specify this we do not use local mode of metastore</description>
</property>

Make sure that the MySQL JDBC driver is available at $HIVE_HOME/lib:

      $ln -sfn /usr/share/java/mysql-connector-java.jar 
      /opt/hive/lib/mysql-connector-java.jar

The difference between using default Derby or configured relational databases as the metastore is that the configured relational database offers a shared service so that all hive users can see the same metadata set. However, the default metastore setting creates the metastore under the folder of the current user, so it is only visible to this user. In the real production environment, it always configures an external relational database as the Hive metastore.

Create the Hive metastore table in the database with proper permission, and initialize the schema with schematool:

      $mysql -u root --password="mypassword" -f 
      -e "DROP DATABASE IF EXISTS metastore; CREATE DATABASE IF NOT 
      EXISTS metastore;"
      $mysql -u root --password="mypassword" 
      -e "GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'localhost'
      IDENTIFIED BY 'mypassword'; FLUSH PRIVILEGES;"
      $schematool -dbType mysql -initSchema

Since Hive runs on Hadoop, first start the hdfs and yarn services, then the metastore and hiveserver2 services:

      $start-dfs.sh
      $start-yarn.sh 
      $hive --service metastore 1>> /tmp/meta.log 2>> /tmp/meta.log & 
      $hive --service hiveserver2 1>> /tmp/hs2.log 2>> /tmp/hs2.log &

Connect Hive with either the hive or beeline command to verify that the installation is successful:

      $hive 
      $beeline -u "jdbc:hive2://localhost:10000"

Table of Contents for Installing Hive from Apache

Create new playlist

Sign In

Sign Up

Table of Contents for
Installing Hive from Apache