Data acquisition of stream data - technology mapping

The following figure brings in technology aspect to the conceptual architecture that we will be the following throughout this book.

We have chosen Apache Flume as the real time data transfer technology and this does come in the data acquisition layer of our Data Lake implementation.

Figure 02: Technology mapping for Acquisition Layer

Inline with our use case of SCV, the real time data from various business applications will flow into the Flume and then transferred to the Hadoop file system for storage and later analysis. The real time data from business application that we are going to handle is the customer’s behavioural data when dealing with the enterprise’s website. Data such as page visits, link clicks, location details, browser details and so on will flow into Flume and then stored in HDFS.

The following figure (Figure 03) shows only the aspect that we will be delving deep in this chapter, rest of the layers and other aspects from Data Lake is intentionally taken away from this diagram. However it does show Sqoop also so that we are building onto our full-fledged Data Lake architecture as we navigate through chapters one by one.

Figure 03: Working of Flume in the Data Lake
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.172.146