Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Time for action – creating a table from an existing file

So far we have loaded data into Hive directly from files over which Hive effectively takes control. It is also possible, however, to create tables that model data held in files external to Hive. This can be useful when we want the ability to perform Hive processing over data written and managed by external applications or otherwise required to be held in directories outside the Hive warehouse directory. Such files are not moved into the Hive warehouse directory or deleted when the table is dropped.

Save the following to a file called states.hql:

CREATE EXTERNAL TABLE states(abbreviation string, full_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '	'
LOCATION '/tmp/states' ;

Copy the data file onto HDFS and confirm its presence afterwards:

$ hadoop fs -put states.txt /tmp/states/states.txt
$ hadoop fs -ls /tmp/states

You will receive the following response:

Found 1 items
-rw-r--r--   3 hadoop supergroup        654 2012-03-03 16:54 /tmp/states/states.txt

Execute the HiveQL script:

$ hive -f states.hql

You will receive the following response:

Logging initialized using configuration in jar:file:/opt/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201203031655_1132553792.txt
OK
Time taken: 3.954 seconds
OK
Time taken: 0.594 seconds

Check the source data file:

$ hadoop fs -ls /tmp/states

You will receive the following response:

Found 1 items
-rw-r--r--   3 hadoop supergroup        654 … /tmp/states/states.txt

Execute a sample query against the table:

$ hive -e "select full_name from states where abbreviation like 'CA'"

You will receive the following response:

Logging initialized using configuration in jar:file:/opt/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201203031655_410945775.txt
Total MapReduce jobs = 1
...
OK
California
Time taken: 15.75 seconds

What just happened?

The HiveQL statement to create an external table only differs slightly from the forms of CREATE TABLE we used previously. The EXTERNAL keyword specifies that the table exists in resources that Hive does not control and the LOCATION clause specifies where the source file or directory are to be found.

After creating the HiveQL script, we copied the source file onto HDFS. For this table, we used the data file from Chapter 4, Developing MapReduce Programs, which maps U.S. states to their common two-letter abbreviation.

After confirming the file was in the expected location on HDFS, we executed the query to create the table and checked the source file again. Unlike previous table creations that moved the source file into the /user/hive/warehouse directory, the states.txt file is still in the HDFS location into which it was copied.

Finally, we executed a query against the table to confirm it was populated with the source data and the expected result confirms this. This highlights an additional difference with this form of CREATE TABLE; for our previous non-external tables, the table creation statement does not ingest any data into the table, a subsequent LOAD DATA or (as we'll see later) INSERT statement performs the actual table population. With table definitions that include the LOCATION specification, we can create the table and ingest data in a single statement.

We now have two tables in Hive; the larger table with UFO sighting data and a smaller one mapping U.S. state abbreviations to their full names. Wouldn't it be a useful combination to use data from the second table to enrich the location column in the former?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Time for action – creating a table from an existing file

Create new playlist

Sign In

Sign Up

Time for action – creating a table from an existing file

What just happened?

Table of Contents for
Time for action – creating a table from an existing file