Adding the layer between the client and Hadoop is the reason for a lot of the ease which Sqoop 2 brings as compared to Sqoop 1. The following table compares this design though between the two versions.
Sqoop 1 | Sqoop 2 |
Command line is the only client option | Command line along with browser interface (via Hue) are the client options |
Client only architecture | Client-server architecture |
Client works only on the same machine where Sqoop is installed | Server setup allows access to Sqoop from different machines |
Tight coupling between other tools (Apache Oozie) when integration is brought about. | Integration is quite easy using the exposed REST API’s |
Connectors and drivers need to be configured for each client installation separately. Each of the clients need to have connection details with them to connect and execute. | Because of server component, connectors and JDBC drivers would be configured in one place |
No well defined role-based access possible. | Role-based access and execution is possible because of the central access by the server component. |
More error prone, as many options are to be filled in manually by the user by reading various available documentation. | Having a browser-based interface makes sure that the user is advised when they make mistakes and that all necessary options are filled in before actually using Sqoop. |
Table 01: Sqoop 1 and Sqoop 2 - Comparison based on Ease of Use
Hue (Hadoop User Experience) is an open-source Web interface that supports Apache Hadoop and its ecosystem, licensed under the Apache v2 license.
- Wikipedia
Quite clearly, this design thought scores well with Sqoop 2 as against Sqoop 1 and most of the advantages come in by default for Sqoop 2 because of the central one-time server component installation.