Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. Data Collection with Flume

In the previous two chapters, we've seen how Hive and Sqoop give a relational database interface to Hadoop and allow it to exchange data with "real" databases. Although this is a very common use case, there are, of course, many different types of data sources that we may want to get into Hadoop.

In this chapter, we will cover:

An overview of data commonly processed in Hadoop
Simple approaches to pull this data into Hadoop
How Apache Flume can make this task a lot easier
Common patterns for simple through sophisticated, Flume setups
Common issues, such as the data lifecycle, that need to be considered regardless of technology

A note about AWS

This chapter will discuss AWS less than any other in the book. In fact, we won't even mention it after this section. There are no Amazon services akin to Flume so there is no AWS-specific product that we could explore. On the other hand, when using Flume, it works exactly the same, be it on a local host or EC2 virtual instance. The rest of this chapter, therefore, assumes nothing about the environment on which the examples are executed; they will perform identically on each.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. Data Collection with Flume

Create new playlist

Sign In

Sign Up

Chapter 10. Data Collection with Flume

A note about AWS

Table of Contents for
10. Data Collection with Flume