In discussions concerning integration of Hadoop with other systems, it is easy to think of it as a one-to-one pattern. Data comes out of one system, gets processed in Hadoop, and then is passed onto a third.
Things may be like that on day one, but the reality is more often a series of collaborating components with data flows passing back and forth between them. How we build this complex network in a maintainable fashion is the focus of this chapter.
For the sake of the discussion, we will categorize data into two broad categories:
We don't assume these data categories are different in any way other than how the data is retrieved.
When we say network data, we mean things like information retrieved from a web server via an HTTP connection, database contents pulled by a client application, or messages sent across a data bus. In each case, the data is retrieved by a client application that either pulls the data across the network or listens for its arrival.
3.137.223.190