HBase is a very popular nonrelational database on Hadoop that stores data in a column-oriented store model. HBase also uses HDFS as its data storage layer and MapReduce to process data. The key difference between Hive and HBase is that HBase is a complete nonrelational database running on Hadoop, while Hive is a SQL-like database that supports SQL statements to process data. As it is another kind of database, HBase supports the concepts of databases, tables, and columns and uses SQL statements to submit queries while processing the data in tables on HDFS.
Impala does not disappoint us and provides great flexibility to query data in HBase tables. Impala tables process datafiles stored on HDFS—great for bulk loads and full-table-scan queries; however, HBase can perform efficient data processing by performing individual row or range lookups. Impala considers HBase a key-value store in which the key is mapped to one column in the Impala table and value fields are mapped to other columns.
While discussing HBase, internals are out of the scope of this book. If you are working on the HBase table with Impala, I would suggest reading the appropriate HBase documentation or visiting the Apache HBase website for the latest documentation, http://hbase.apache.org/.
Here are the steps to work with HBase and Impala together:
CREATE EXTERNAL TABLE
and specific keywords and map Hive tables with HBase tables. We are using the Hive shell only because certain keywords used in SQL statements are not supported in Impala.#string
keyword or map it to the STRING
column.GRANT
command in HBase shell can do this.While querying HBase tables, Impala uses the HBase client API to query data stored in HBase. You can create external tables in Hive with or without the string key. Here is an example of creating a table first in HBase and then in Hive for mapping, and finally, querying it in Impala:
Create 'hbasetable', 'ints', 'strings' Enable 'hbasetable'
CREATE EXTERNAL TABLE hivetableforhbase_userid ( UserId string, /* Row Key is set as String */ UserName string, UserAge int, UserDob timestamp) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,strings:UserID,strings:UserName,ints:UserAge,strings:UserBob ) TBLPROPERTIES("hbase.table.name" = "hivetableforhbaseuseragg");
CREATE EXTERNAL TABLE hivetableforhbase ( UserId int, /* Row Key is not set as String */ UserName string, UserAge int, UserDob timestamp) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,strings:UserID, strings:UserName, ints:UserAge, strings:UserBob ) TBLPROPERTIES("hbase.table.name" = "hivetableforhbase");
-- When row key is mapped as string column, range predicates are applied in the scan SELECT * FROM hivetableforhbase_useragg WHERE UserId = '10'; -- When row key is not transformed into scan parameter (not mapped as string) SELECT * FROM hivetableforhbase WHERE id = 10;
3.16.81.33