Connecting HBase with Hive

We can map an HBase table to Hive (browse https://hive.apache.org if you don't know about Hive already) and run Hive queries that support Hive Query Language (HQL) almost in the same way as SQL on an HBase table. This is good for developers or users who possess a good knowledge of SQL.

For this, we need to create a table in HBase. Let's start the process:

Create a table in Hive as follows:

create 'hivehbasetable', 'name'

Put some data into it:

put 'hivehbasetable', 'row1', 'name:firstname', 'shashwat'
put 'hivehbasetable', 'row1', 'name:lastname', 'shriparv'
put 'hivehbasetable', 'row1', 'name:title', 'mr'

We need some JAR files for this association (Hive needs to be told where these JARs are), which are as follows:

  • Guava.<version>.jar
  • Hive-Hbase handler.<version>.jar
  • HBase.<version>.jar
  • Zookeeper.<version>.jar

Then, we will create an external table in Hive, which will map the HBase table to Hive. Start Hive with following command:

hive --auxpath /usr/lib/hive/lib/hbase.jar,/usr/lib/hive/lib/hive-hbase-handler-<version>.jar, /usr/lib/hive/lib/zookeeper.jar,/usr/lib/hive/lib/guava-<version>.jar

To prevent library-related errors, use the following command:

CREATE EXTERNAL TABLE hivehbasetablemapped (key string, userid string, bookid string, rating int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,name:firstname,name:lastname,name:title") TBLPROPERTIES ("hbase.table.name" = "hivehbasetable");

Alternatively, you can use the following command:

CREATE TABLE hivehbasetablemapped (key string, userid string, bookid string, rating int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,name:firstname,name:lastname,name:title") TBLPROPERTIES ("hbase.table.name" = "hivehbasetable");

Note

Difference between external and internal tables in Hive

When we drop an internal table, it drops both the data and metadata.

When we drop an external table, it only drops the metadata. This means Hive is ignorant about the data now. It does not touch the data itself.

Here, the first column is the key column, which will be taken as HBase RowKey. If we have some numeric column, we can use age:age#field in the mapping.

Once we finish the preceding process successfully, we can execute a SQL query in Hive as follows:

hive > select * from hivehbasetablemapped;

We can also perform other operations on Hive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.202.61