Let's use a type mapping to improve our data import.
$ hadoop fs -rmr employees
sqoop import --connect jdbc:mysql://10.0.0.100/hadooptest --username hadoopuser -P --table employees --hive-import --hive-table employees --map-column-hive start_date=timestamp
You will receive the following response:
12/05/23 14:53:38 INFO hive.HiveImport: Hive import complete.
$ hive -e "describe employees"
You will receive the following response:
OK first_name string dept string salary int start_date timestamp Time taken: 2.547 seconds
$ hive -e "select * from employees";
You will receive the following response:
OK Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] Time taken: 2.73 seconds
Our Sqoop
command line here is similar to our original Hive import, except for the addition of the column mapping specification. We specified that the start_date
column should be of type TIMESTAMP
, and we could have added other specifications. The option takes a comma-separated list of such mappings.
After confirming Sqoop executed successfully, we examined the created Hive table and verified that the mapping was indeed applied and that the start_date
column has type TIMESTAMP
.
We then tried to retrieve the data from the table and could not do so, receiving an error about type format mismatch.
On reflection, this should not be a surprise. Though we specified the desired column type was to be TIMESTAMP
, the actual data being imported from MySQL was of type DATE
, which does not contain the time component required in a timestamp. This is an important lesson. Ensuring that the type mappings are correct is only one part of the puzzle; we must also ensure the data is valid for the specified column type.
18.190.239.166