Now, we will launch the Impala shell and verify if the autos
database is created and both automobiles
and automakers
are created or not. Let's launch the Impala shell first as follows:
$impala-shell
Once the Impala shell is started, I have used the autos
database first and verified our steps from the preceding script.
Now, we will use some database and table commands to verify the execution of the script in the previous sections. Once we see that the database is created and data is loaded, we can push queries from the Impala shell against our example database as follows:
[Hadoop.testdomain:21000] > USE autos; [Hadoop.testdomain:21000] > SHOW tables; Query finished, fetching results ... +-------------+ | name | +-------------+ | automakers | | automobiles | +-------------+ Returned 2 row(s) in 0.11s
In the preceding code, we are first using the autos
databases and then listing all the tables within this selected database.
In the next step, we will understand each table's schema using the describe
command as shown in the following two examples. The first example is as follows:
[Hadoop.testdomain:21000] > describe automakers; Query finished, fetching results ... +------------+--------+---------+ | name | type | comment | +------------+--------+---------+ | automaker | string | | +------------+--------+---------+ Returned 2 row(s) in 0.46s
The following is the second example:
[Hadoop.testdomain:21000] > describe automobiles; Query finished, fetching results ... +--------------+--------+---------+ | name | type | comment | +--------------+--------+---------+ | make | string | | | model | string | | | autoyear | int | | | fueltype | string | | | numofdoors | int | | | design | string | | | autotype | string | | | cylinders | int | | | horsepower | int | | | city_hwy_mpg | string | | | price | float | | +--------------+--------+---------+ Returned 11 row(s) in 0.54s
In the preceding code snippets, you can see how the describe
command shows each field's name and field-type information.
Now, I will create another table named motorcycles
and load data from the /user/cloudera/motorcycles/motorcycles.txt
file located in HDFS into the table as follows:
[Hadoop.testdomain:21000] > CREATE EXTERNAL TABLE IF NOT EXIST motorcycles ( make STRING, model STRING, year INTEGER, fuelType STRING, wheels INTEGER, body STRING, style STRING, cc_rpm INTEGER, highSpeed INTEGER, automatic BOOLEAN, price FLOAT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/cloudera/motorcycles' ; Query: create EXTERNAL TABLE motorcycles ( make STRING, model STRING, year INTEGER, fuelType STRING, wheels INTEGER, body STRING, style STRING, cc_rpm INTEGER, highSpeed INTEGER, automatic BOOLEAN, price FLOAT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/cloudera/motorcycles'
In the preceding command, we are creating an external table only if it does not exist. And then we will pass the motorcycles
text content from the text file to populate the table.
Now, we can check the schema of the newly created motorcycles
table using the describe
command as follows:
[Hadoop.testdomain:21000] > describe motorcycles; Query finished, fetching results ... +-----------+---------+---------+ | name | type | comment | +-----------+---------+---------+ | make | string | | | model | string | | | year | int | | | fueltype | string | | | wheels | int | | | body | string | | | style | string | | | cc_rpm | int | | | highspeed | int | | | automatic | boolean | | | price | float | | +-----------+---------+---------+ Returned 11 row(s) in 1.22s
3.22.71.28