Now that we have created a table, let us load the UFO data into it.
$ hadoop fs -put ufo.tsv /tmp/ufo.tsv
$ hadoop fs -ls /tmp
You will receive the following response:
Found 2 items drwxrwxr-x - hadoop supergroup 0 … 14:52 /tmp/hive-hadoop -rw-r--r-- 3 hadoop supergroup 75342464 2012-03-03 16:01 /tmp/ufo.tsv
$ hive
ufodata
table:hive> LOAD DATA INPATH '/tmp/ufo.tsv' OVERWRITE INTO TABLE ufodata;
You will receive the following response:
Loading data to table default.ufodata Deleted hdfs://head:9000/user/hive/warehouse/ufodata OK Time taken: 5.494 seconds
hive> quit;
$ hadoop fs -ls /tmp
You will receive the following response:
Found 1 items drwxrwxr-x - hadoop supergroup 0 … 16:10 /tmp/hive-hadoop
We first copied onto HDFS the tab-separated file of UFO sightings used previously in Chapter 4, Developing MapReduce Programs. After validating the file's presence on HDFS, we started the Hive interactive shell and used the LOAD DATA
command to load the file into the ufodata
table.
Because we are using a file already on HDFS, the path was specified by INPATH
alone. We could have loaded directly from a file on the local filesystem (obviating the need for the prior explicit HDFS copy) by using LOCAL INPATH
.
We specified the OVERWRITE
statement which will delete any existing data in the table before loading the new data. This obviously should be used with care, as can be seen from the output of the command, the directory holding the table data is removed by use of OVERWRITE
.
Note the command took only a little over five seconds to execute, significantly longer than it would have taken to copy the UFO data file onto HDFS.
After exiting the Hive shell, we look again at the directory into which we copied the data file and find it is no longer there. If a LOAD
statement is given a path to data on HDFS, it will not simply copy this into /user/hive/datawarehouse
, but will move it there instead. If you want to analyze data on HDFS that is used by other applications, then either create a copy or use the EXTERNAL
mechanism that will be described later.
18.227.134.154