Regardless of these limitations, let's demonstrate that, in the right situations, we can use Sqoop to directly export data stored in Hive.
$ echo "truncate employees" | mysql –u hadoopuser –p hadooptest
You will receive the following response:
Query OK, 0 rows affected (0.01 sec)
$ hadoop fs –ls /user/hive/warehouse/employees
You will receive the following response:
Found 1 items … /user/hive/warehouse/employees/part-m-00000
sqoop export --connect jdbc:mysql://10.0.0.100/hadooptest --username hadoopuser –P --table employees --export-dir /user/hive/warehouse/employees --input-fields-terminated-by ' 01' --input-lines-terminated-by ' '
Firstly, we truncated the employees
table in MySQL to remove any existing data and then confirmed the employee table data was where we expected it to be.
The
Sqoop export
command is like before; the only changes are the different source location for the data and the addition of explicit field and line terminators. Recall that Hive, by default, uses ASCII code 001 and
for its field and line terminators, respectively (also recall, though, that we have previously imported files into Hive with other separators, so this is something that always needs to be checked).
We execute the Sqoop command and watch it fail due to Java IllegalArgumentExceptions
when trying to create instances of java.sql.Date
.
We are now hitting the reverse of the problem we encountered earlier; the original type in the source MySQL table had a datatype not supported by Hive, and we converted the data to match the available type of TIMESTAMP
. When exporting data back again, however, we are now trying to create a DATE
using a TIMESTAMP
value, which is not possible without some conversion.
The lesson here is that our earlier approach of doing a one-way conversion only worked for as long as we only had data flowing in one direction. As soon as we need bi-directional data transfer, mismatched types between Hive and the relational store add complexity and require the insertion of conversion routines.
3.137.174.23