The following are major DDLs for the internal and external table creation:
- Show the data file content of employee.txt:
$ vi /home/hadoop/employee.txt
Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer^DLead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
- Create an internal table and load the data:
> CREATE TABLE IF NOT EXISTS employee_internal (
> name STRING COMMENT 'this is optinal column comments',
> work_place ARRAY<STRING>,-- table column names are NOT case
sensitive
> gender_age STRUCT<gender:STRING,age:INT>,
> skills_score MAP<STRING,INT>, -- columns names are lower case
> depart_title MAP<STRING,ARRAY<STRING>>-- No "," for the last
column
> )
> COMMENT 'This is an internal table'-- This is optional table
comments
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|' -- Symbol to seperate columns
> COLLECTION ITEMS TERMINATED BY ','-- Seperate collection elements
> MAP KEYS TERMINATED BY ':' -- Symbol to seperate keys and values
> STORED as TEXTFILE; -- Table file format
No rows affected (0.149 seconds)
> LOAD DATA INPATH '/tmp/hivedemo/data/employee.txt'
> OVERWRITE INTO TABLE employee_internal;
- Create an external table and load the data:
> CREATE EXTERNAL TABLE employee_external ( -- Use EXTERNAL keywords
> name string,
> work_place ARRAY<string>,
> gender_age STRUCT<gender:string,age:int>,
> skills_score MAP<string,int>,
> depart_title MAP<STRING,ARRAY<STRING>>
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> STORED as TEXTFILE
> LOCATION '/user/dayongd/employee'; -- Specify data folder location
No rows affected (1.332 seconds)
> LOAD DATA INPATH '/tmp/hivedemo/data/employee.txt'
> OVERWRITE INTO TABLE employee_external;
Hive also supports creating temporary tables. A temporary table is only visible to the current user session. It's automatically deleted at the end of the session. The data of the temporary table is stored in the user's scratch directory, such as /tmp/hive-<username>. Therefore, make sure the folder is properly configured or secured when you have sensitive data in temporary tables. Whenever a temporary table has the same name as a permanent table, the temporary table will be chosen rather than the permanent table. A temporary table does not support partitions and indexes. The following are three ways to create temporary tables:
> CREATE TEMPORARY TABLE IF NOT EXISTS tmp_emp1 (
> name string,
> work_place ARRAY<string>,
> gender_age STRUCT<gender:string,age:int>,
> skills_score MAP<string,int>,
> depart_title MAP<STRING,ARRAY<STRING>>
> );
No rows affected (0.122 seconds)
> CREATE TEMPORARY TABLE tmp_emp2 as SELECT * FROM tmp_emp1;
> CREATE TEMPORARY TABLE tmp_emp3 like tmp_emp1;
Tables can also be created and populated by the results of a query in one statement, called Create-Table-As-Select (CTAS). The table created by CTAS is not visible by other users until all the query results are populated. CTAS has the following restrictions:
- The table created cannot be a partitioned table
- The table created cannot be an external table
- The table created cannot be a list-bucketing table
A CTAS statement always triggers a yarn job to populate the data, although the SELECT * statement itself does not trigger any yarn job.
CTAS can also be used with CTE, which stands for Common Table Expression. CTE is a temporary result set derived from a simple select query specified in a WITH clause, followed by the SELECT or INSERT statement to build the result set. The CTE is defined only within the execution scope of a single statement. One or more CTEs can be used in a nested or chained way with keywords, such as the SELECT, INSERT, CREATE TABLE AS SELECT, or CREATE VIEW AS SELECT statements. Using CTE of HQL makes the query more concise and clear than writing complex nested queries.
The following are examples using CTAS and CTE for table creation:
- Create a table with CTAS:
> CREATE TABLE ctas_employee as SELECT * FROM employee_external;
No rows affected (1.562 seconds)
- Create a table with both CTAS and CTE:
> CREATE TABLE cte_employee as
> WITH r1 as (
> SELECT name FROM r2 WHERE name = 'Michael'
> ),
> r2 as (
> SELECT name FROM employee WHERE gender_age.gender= 'Male'
> ),
> r3 as (
> SELECT name FROM employee WHERE gender_age.gender= 'Female'
> )
> SELECT * FROM r1
> UNION ALL
> SELECT * FROM r3;
No rows affected (61.852 seconds)
> SELECT * FROM cte_employee;
+----------------------------+
| cte_employee.name |
+----------------------------+
| Michael |
| Shelley |
| Lucy |
+----------------------------+
3 rows selected (0.091 seconds)
- Use CTAS to create an empty table by copying the schema from another table. It is empty because the where condition is false:
> CREATE TABLE empty_ctas_employee as
> SELECT * FROM employee_internal WHERE 1=2;
No rows affected (213.356 seconds)
- In another way, we can also use CREATE TABLE LIKE to create an empty table. This is a faster way to copy the table schema since it does not trigger any jobs but only copies metadata:
> CREATE TABLE empty_like_employee LIKE employee_internal;
No rows affected (0.115 seconds)