Time for action – downloading and configuring Sqoop

Let's download and get Sqoop installed and configured.

  1. Go to the Sqoop homepage, select the link for the most stable version that is no earlier than 1.4.1, and match it with the version of Hadoop you are using. Download the file.
  2. Copy the retrieved file where you want it installed on your system; then uncompress it:
    $mv sqoop-1.4.1-incubating__hadoop-1.0.0.tar.gz_ /usr/local
    $ cd /usr/local
    $ tar –xzf sqoop-1.4.1-incubating__hadoop-1.0.0.tar.gz_
    
  3. Make a symlink:
    $ ln -s sqoop-1.4.1-incubating__hadoop-1.0.0 sqoop
    
  4. Update your environment:
    $ export SQOOP_HOME=/usr/local/sqoop
    $ export PATH=${SQOOP_HOME}/bin:${PATH}
    
  5. Download the JDBC driver for your database; for MySQL, we find it at http://dev.mysql.com/downloads/connector/j/5.0.html.
  6. Copy the downloaded JAR file into the Sqoop lib directory:
    $ cp mysql-connector-java-5.0.8-bin.jar /opt/sqoop/lib
    
  7. Test Sqoop:
    $ sqoop help
    

    You will see the following output:

    usage: sqoop COMMAND [ARGS]
    Available commands:
      codegen            Generate code to interact with database records
    
      version            Display version information
    See 'sqoop help COMMAND' for information on a specific command.
    

What just happened?

Sqoop is a pretty straightforward tool to install. After downloading the required version from the Sqoop homepage—being careful to pick the one that matches our Hadoop version—we copied and unpacked the file.

Once again, we needed to set an environment variable and added the Sqoop bin directory to our path so we can either set these directly in our shell, or as before, add these steps to a configuration file we can source prior to a development session.

Sqoop needs access to the JDBC driver for your database; for us, we downloaded the MySQL Connector and copied it into the Sqoop lib directory. For the most popular databases, this is as much configuration as Sqoop requires; if you want to use something exotic, consult the Sqoop documentation.

After this minimal install, we executed the sqoop command-line utility to validate that it is working properly.

Note

You may see warning messages from Sqoop telling you that additional variables such as HBASE_HOME have not been defined. As we are not talking about HBase in this book, we do not need this setting and will be omitting such warnings from our screenshots.

Sqoop and Hadoop versions

We were very specific in the version of Sqoop to be retrieved before; much more so than for previous software downloads. In Sqoop versions prior to 1.4.1, there is a dependency on an additional method on one of the core Hadoop classes that was only available in the Cloudera Hadoop distribution or versions of Hadoop after 0.21.

Unfortunately, the fact that Hadoop 1.0 is effectively a continuation of the 0.20 branch meant that Sqoop 1.3, for example, would work with Hadoop 0.21 but not 0.20 or 1.0. To avoid this version confusion, we recommend using version 1.4.1 or later, which removes the dependency.

There is no additional MySQL configuration required; we would discover if the server had not been configured to allow remote clients, as described earlier, through use of Sqoop.

Sqoop and HDFS

The simplest import we can perform is to dump data from a database table onto structured files on HDFS. Let's do that.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.78.136