Configuring Sqoop for Microsoft SQL Server

This recipe shows how to configure Sqoop to connect with Microsoft SQL Server databases. This will allow data to be efficiently loaded from a Microsoft SQL Server database into HDFS.

Getting ready

This example uses Sqoop v1.3.0.

If you are using CDH3, you already have Sqoop installed. If you are not running CDH3, you can find instructions for your distro at https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation.

This recipe assumes that you have an instance of SQL Server up and running that can connect to your Hadoop cluster.

How to do it...

Complete the following steps to configure Sqoop to connect with Microsoft SQL Server:

  1. Download the Microsoft SQL Server JDBC Driver 3.0 from the following site http://download.microsoft.com/download/D/6/A/D6A241AC-433E-4CD2-A1CE-50177E8428F0/1033/sqljdbc_3.0.1301.101_enu.tar.gz.

    This download contains the SQL Server JDBC driver (sqljdbc4.jar). Sqoop connects to relational databases using JDBC drivers.

  2. Uncompress and extract the TAR file:
    gzip -d sqljdbc_3.0.1301.101_enu.tar.gz
    tar -xvf sqljdbc_3.0.1301.101_enu.tar

    This will result in a new folder being created, sqljdbc_3.0.

  3. Copy sqljdbc4.jar to $SQOOP_HOME/lib:
    cp sqljdbc_3.0/enu/sqljdbc4.jar $SQOOP_HOME/lib

    Sqoop now has access to the sqljdbc4.jar file and will be able to use it to connect to a SQL Server instance.

  4. Download the Microsoft SQL Server Connector for Apache Hadoop from the site http://download.microsoft.com/download/B/E/5/BE5EC4FD-9EDA-4C3F-8B36-1C8AC4CE2CEF/sqoop-sqlserver-1.0.tar.gz.
  5. Uncompress and extract the TAR file:
    gzip -d sqoop-sqlserver-1.0.tar.gz
    tar -xvf sqoop-sqlserver-1.0.tar

    This will result in a new folder being created, sqoop-sqlserver-1.0.

  6. Set the MSSQL_CONNECTOR_HOME environment variable:
    export MSSQL_CONNECTOR_HOME=/path/to/sqoop-sqlserver-1.0
  7. Run the installation script:
    ./install.sh
  8. For importing and exporting data, see the Importing data from MySQL into HDFS using Sqoop and Exporting data from HDFS into MySQL using Sqoop recipes of this chapter. These recipes apply to SQL Server as well. The --connect argument must be changed to --connect jdbc:sqlserver://<HOST>:<PORT>.

How it works...

Sqoop communicates with databases using JDBC. After adding the sqljdbc4.jar file to the $SQOOP_HOME/lib folder, Sqoop will be able to connect to SQL Server instances using --connect jdbc:sqlserver://<HOST>:<PORT>. In order for SQL Server to have full compatibility with Sqoop, some configuration changes are necessary. The configurations are updated by running the install.sh script.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.194.106