Installing Hive from Apache

To introduce the Hive installation, we will use Hive version 2.3.3 as an example. The pre-installation requirements for this installation are as follows:

  • JDK 1.8
  • Hadoop 2.x.y
  • Ubuntu 16.04/CentOS 7
Since we focus on Hive in this book, the installation steps for Java and Hadoop are not provided here. For steps on installing them, please refer to https://www.java.com/en/download/help/download_options.xml and http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html.

The following steps describe how to install Apache Hive in the command-line environment:

  1. Download Hive from Apache Hive and unpack it:
      $cd /opt
$wget https://archive.apache.org/dist/hive/hive-2.3.3/apache-
hive-2.3.3-bin.tar.gz

$tar -zxvf apache-hive-2.3.3-bin.tar.gz
$ln -sfn /opt/apache-hive-2.3.3 /opt/hive
  1. Add the necessary system path variables in the ~/.profile or ~/.bashrc file:
      export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=/opt/hadoop/conf
export HIVE_HOME=/opt/hive
export HIVE_CONF_DIR=/opt/hive/conf
export PATH=$PATH:$HIVE_HOME/bin:$HADOOP_HOME/
bin:$HADOOP_HOME/sbin
  1. Enable the settings immediately:
      $source ~/.profile
  1. Create the configuration files:
      $cd /opt/hive/conf
$cp hive-default.xml.template hive-site.xml
$cp hive-exec-log4j.properties.template hive-exec-
log4j.properties

$cp hive-log4j.properties.template hive-log4j.properties
  1. Modify $HIVE_HOME/conf/hive-site.xml, which has some important parameters to set:
  • hive.metastore.warehourse.dir: This is the path to the Hive warehouse location. By default, it is at /user/hive/warehouse.
  • hive.exec.scratchdir: This is the temporary data file location. By default, it is at /tmp/hive-${user.name}.

By default, Hive uses the Derby (http://db.apache.org/derby/) database as the metadata store. It can also use other relational databases, such as Oracle, PostgreSQL, or MySQL, as the metastore. To configure the metastore on other databases, the following parameters should be configured in hive-site.xml:

  • javax.jdo.option.ConnectionURL: This is the JDBC URL database
  • javax.jdo.option.ConnectionDriverName: This is the JDBC driver class name
  • javax.jdo.option.ConnectionUserName: This is the username used to access the database
  • javax.jdo.option.ConnectionPassword: This is the password used to access the database

The following is a sample setting using MySQL as the metastore database:

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
</value> <description>JDBC connect string for a JDBC metastore</description> </property>
<property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property>
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property>
<property> <name>javax.jdo.option.ConnectionPassword</name> <value>mypassword</value> <description>password to use against metastore database</description> </property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>By specify this we do not use local mode of metastore</description>
</property>
  1. Make sure that the MySQL JDBC driver is available at $HIVE_HOME/lib:
      $ln -sfn /usr/share/java/mysql-connector-java.jar 
/opt/hive/lib/mysql-connector-java.jar
The difference between using default Derby or configured relational databases as the metastore is that the configured relational database offers a shared service so that all hive users can see the same metadata set. However, the default metastore setting creates the metastore under the folder of the current user, so it is only visible to this user. In the real production environment, it always configures an external relational database as the Hive metastore.
  1. Create the Hive metastore table in the database with proper permission, and initialize the schema with schematool:
      $mysql -u root --password="mypassword" -f 
-e "DROP DATABASE IF EXISTS metastore; CREATE DATABASE IF NOT
EXISTS metastore;
"
$mysql -u root --password="mypassword"
-e "GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'localhost'
IDENTIFIED BY 'mypassword'; FLUSH PRIVILEGES;
"
$schematool -dbType mysql -initSchema
  1. Since Hive runs on Hadoop, first start the hdfs and yarn services, then the metastore and hiveserver2 services:
      $start-dfs.sh
$start-yarn.sh
$hive --service metastore 1>> /tmp/meta.log 2>> /tmp/meta.log &
$hive --service hiveserver2 1>> /tmp/hs2.log 2>> /tmp/hs2.log &
  1. Connect Hive with either the hive or beeline command to verify that the installation is successful:
      $hive 
$beeline -u "jdbc:hive2://localhost:10000"
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.51.36