Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Data data everywhere...

In discussions concerning integration of Hadoop with other systems, it is easy to think of it as a one-to-one pattern. Data comes out of one system, gets processed in Hadoop, and then is passed onto a third.

Things may be like that on day one, but the reality is more often a series of collaborating components with data flows passing back and forth between them. How we build this complex network in a maintainable fashion is the focus of this chapter.

Types of data

For the sake of the discussion, we will categorize data into two broad categories:

Network traffic, where data is generated by a system and sent across a network connection
File data, where data is generated by a system and written to files on a filesystem somewhere

We don't assume these data categories are different in any way other than how the data is retrieved.

Getting network traffic into Hadoop

When we say network data, we mean things like information retrieved from a web server via an HTTP connection, database contents pulled by a client application, or messages sent across a data bus. In each case, the data is retrieved by a client application that either pulls the data across the network or listens for its arrival.

Note

In several of the following examples, we will use the curl utility to either retrieve or send network data. Ensure that it is installed on your system and install it if not.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Data data everywhere...

Create new playlist

Sign In

Sign Up

Data data everywhere...

Types of data

Getting network traffic into Hadoop

Note

Table of Contents for
Data data everywhere...