Loading data from HDFS using the LOAD DATA statement

As you know, data is stored in HDFS and Impala processes this data. So, when you need to perform some Extract Transform Load (ETL) activity to load the data from HDFS to Impala tables, you can use LOAD DATA statements. The key properties of LOAD DATA statements are as follows:

  • The loaded data files are moved from HDFS to the Impala data directory
  • You can either give a file name from HDFS or a directory name to load all the files into an Impala table; however, a wild card pattern is not supported with the HDFS path

The LOAD DATA statement and examples are as follows:

LOAD DATA INPATH 'hdfs_file_or_directory_path' [OVERWRITE] 
              INTO TABLE tablename
              [PARTITION (partcol1=val1, partcol2=val2 ...)]

Examples:

CREATE TABLE students (id int, name string);
LOAD DATA INPATH '/user/avkash/students.txt' INTO TABLE students;

In the previous example, you have to make sure that the students.txt file is located at HDFS in folder /user/avkash.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.22.160