Commands for loading data into Impala tables

This section covers activity to load data into Impala tables in two steps. First step moves the file, which we have created previously, from the local filesystem to HDFS. In the second step, the data is transferred into Impala tables, from the source file located on HDFS.

HDFS specific commands

Now we will make sure that the preceding files are located in our Linux files system as follows:

[cloudera@localhost ~]$ ls -l *.txt
-rw-rw-r-- 1 cloudera cloudera 985 Oct 15 18:48 automobiles.txt
-rw-rw-r-- 1 cloudera cloudera 932 Oct 15 18:49 motorcycles.txt

Our next step is to move these files from the local filesystem to HDFS in separate folders. Depending on your Hadoop version, you can use either Hadoop fs or hdfs dfs; however, I have used hdfs dfs as follows:

[cloudera@localhost ~]$ hdfs dfs -mkdir /user/cloudera/automobiles
[cloudera@localhost ~]$ hdfs dfs -mkdir /user/cloudera/motorcycles
[cloudera@localhost ~]$ hdfs dfs -ls /user/cloudera/
Found 2 items
drwxr-xr-x   - cloudera cloudera          0 2013-10-15 19:16 /user/cloudera/automobiles
drwxr-xr-x   - cloudera cloudera          0 2013-10-15 19:16 /user/cloudera/motorcycles
[cloudera@localhost ~]$ hdfs dfs -moveFromLocal automobiles.txt /user/cloudera/automobiles/automobiles.txt
[cloudera@localhost ~]$ hdfs dfs -moveFromLocal motorcycles.txt /user/cloudera/motorcycles/motorcycles.txt
[cloudera@localhost ~]$ hdfs dfs -ls /user/cloudera/motorcycles/
Found 1 items
-rw-r--r--   3 cloudera cloudera        932 2013-10-15 19:19 /user/cloudera/motorcycles/motorcycles.txt
[cloudera@localhost ~]$ hdfs dfs -ls /user/cloudera/automobiles/
Found 1 items
-rw-r--r--   3 cloudera cloudera        985 2013-10-15 19:17 /user/cloudera/automobiles/automobiles.txt

Now, we will load the preceding data into two separate tables in two different steps, to learn various ways of loading data. The tables we are using here are external tables instead of internal. For automobile data, I will load them directly from a script into the automobiles table; and then I will load motorcycle data in the motorcycles table inside the Impala shell. In the script, I will add another empty table, automakers. Later, we will join a list of automakers from both tables. All of this processing will be done in a database named autos.

Loading data into the Impala table from HDFS

Here is the SQL script to create a database autos first, create the automobiles table, and then load the whole dataset from HDFS. I am also creating an empty table automakers in the autos_script.sql script as follows:

USE default;
DROP DATABASE IF EXISTS autos;
CREATE DATABASE autos;
USE autos;
DROP TABLE IF EXISTS automobiles;
CREATE EXTERNAL TABLE automobiles
(
  make STRING,
  model STRING,
  year INTEGER,
  fuelType STRING,
  numOfDoors INTEGER,
  design STRING,
  type STRING,
  cylinders INTEGER,
  horsePower INTEGER,
  city_hwy_mpg STRING,
  price FLOAT
)
ROW FORMAT DELIMITED FIELD TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/cloudera/automobiles/automobiles.txt';
CREATE EXTERNAL TABLE IF NOT EXISTS automakers
(
  autoMaker STRING
)
ROW FORMAT DELIMITED FIELD TERMINATED BY ',';

Let's understand the preceding script. In the first line, we are setting the default database and then removing the autos database if it exists. Next, we are creating a totally new autos database and then using the auto database. After that, we are dropping the automobiles table if it exists. Then, we are creating a new automobiles table using the schema as defined earlier. Then, we are passing the data source text files so the table can be populated with appropriate content. At last, we will also create another table named automaker and use the IF NOT EXISTS syntax. This syntax means that this particular table is created only if it does not exist.

Now, we are going to execute the preceding SQL script with the Impala shell using the following command syntax:

$impala-shell -i Impala-Server-Name:PORT -f SQL_Script_Name.sql

Here is the execution of the preceding SQL script:

[cloudera@localhost ~]$ impala-shell -i localhost.localdomain:21000 -f autos_setup.sql 
Connected to localhost.localdomain:21000
Server version: impalad version 1.0.1 RELEASE (build df844fb967cec8740f08dfb8b21962bc053527ef)
Query: use default
Query: drop DATABASE IF EXISTS autos
Query: create DATABASE autos
Query: use autos
Query: drop TABLE IF EXISTS automobiles
Query: create EXTERNAL TABLE automobiles ( make STRING, model STRING, autoyear INTEGER, fuelType STRING, numOfDoors INTEGER, design STRING, autoType STRING, cylinders INTEGER, horsePower INTEGER, city_hwy_mpg STRING, price FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/cloudera/automobiles'
Query: drop TABLE IF EXISTS automakers
Query: create EXTERNAL TABLE automakers (autoMaker STRING)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.169.109