Time for action – using a type mapping

Let's use a type mapping to improve our data import.

  1. Delete any existing output directory:
    $ hadoop fs -rmr employees
    
  2. Execute Sqoop with an explicit type mapping:
    sqoop import --connect jdbc:mysql://10.0.0.100/hadooptest --username hadoopuser 
    -P --table employees 
    --hive-import --hive-table employees 
    --map-column-hive start_date=timestamp
    

    You will receive the following response:

    12/05/23 14:53:38 INFO hive.HiveImport: Hive import complete.
    
  3. Examine the created table definition:
    $ hive -e "describe employees"
    

    You will receive the following response:

    OK
    first_name  string  
    dept  string  
    salary  int  
    start_date  timestamp  
    Time taken: 2.547 seconds
    
  4. Examine the imported data:
    $ hive -e "select * from employees";
    

    You will receive the following response:

    OK
    Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
    Time taken: 2.73 seconds
    

What just happened?

Our Sqoop command line here is similar to our original Hive import, except for the addition of the column mapping specification. We specified that the start_date column should be of type TIMESTAMP, and we could have added other specifications. The option takes a comma-separated list of such mappings.

After confirming Sqoop executed successfully, we examined the created Hive table and verified that the mapping was indeed applied and that the start_date column has type TIMESTAMP.

We then tried to retrieve the data from the table and could not do so, receiving an error about type format mismatch.

On reflection, this should not be a surprise. Though we specified the desired column type was to be TIMESTAMP, the actual data being imported from MySQL was of type DATE, which does not contain the time component required in a timestamp. This is an important lesson. Ensuring that the type mappings are correct is only one part of the puzzle; we must also ensure the data is valid for the specified column type.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.239.166