Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Time for action – inserting the UFO data

Now that we have created a table, let us load the UFO data into it.

Copy the UFO data file onto HDFS:
```
$ hadoop fs -put ufo.tsv /tmp/ufo.tsv
```

Confirm that the file was copied:

$ hadoop fs -ls /tmp

You will receive the following response:

Found 2 items
drwxrwxr-x   - hadoop supergroup          0 … 14:52 /tmp/hive-hadoop
-rw-r--r--   3 hadoop supergroup   75342464 2012-03-03 16:01 /tmp/ufo.tsv

Enter the Hive interactive shell:
```
$ hive
```

Load the data from the previously copied file into the ufodata table:

hive> LOAD DATA INPATH '/tmp/ufo.tsv' OVERWRITE INTO TABLE ufodata;

You will receive the following response:

Loading data to table default.ufodata
Deleted hdfs://head:9000/user/hive/warehouse/ufodata
OK
Time taken: 5.494 seconds

Exit the Hive shell:
```
hive> quit;
```

Check the location from which we copied the data file:

$ hadoop fs -ls /tmp

You will receive the following response:

Found 1 items
drwxrwxr-x   - hadoop supergroup          0 … 16:10 /tmp/hive-hadoop

What just happened?

We first copied onto HDFS the tab-separated file of UFO sightings used previously in Chapter 4, Developing MapReduce Programs. After validating the file's presence on HDFS, we started the Hive interactive shell and used the LOAD DATA command to load the file into the ufodata table.

Because we are using a file already on HDFS, the path was specified by INPATH alone. We could have loaded directly from a file on the local filesystem (obviating the need for the prior explicit HDFS copy) by using LOCAL INPATH.

We specified the OVERWRITE statement which will delete any existing data in the table before loading the new data. This obviously should be used with care, as can be seen from the output of the command, the directory holding the table data is removed by use of OVERWRITE.

Note the command took only a little over five seconds to execute, significantly longer than it would have taken to copy the UFO data file onto HDFS.

Note

Though we specified an explicit file in this example, it is possible to load multiple files with a single statement by specifying a directory as the INPATH location; in such a case, all files within the directory will be loaded into the table.

After exiting the Hive shell, we look again at the directory into which we copied the data file and find it is no longer there. If a LOAD statement is given a path to data on HDFS, it will not simply copy this into /user/hive/datawarehouse, but will move it there instead. If you want to analyze data on HDFS that is used by other applications, then either create a copy or use the EXTERNAL mechanism that will be described later.

Validating the data

Now that we have loaded the data into our table, it is good practice to do some quick validating queries to confirm all is as expected. Sometimes our initial table definition turns out to be incorrect.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Time for action – inserting the UFO data

Create new playlist

Sign In

Sign Up

Time for action – inserting the UFO data

What just happened?

Note

Validating the data

Table of Contents for
Time for action – inserting the UFO data