Data Ingestion Layer

Data ingestion is the process of obtaining and importing data for immediate use or storage in a database. To ingest something is to "take something in or absorb something."

whatis.com

In Chapter 2, Comprehensive Concepts of a Data Lake you will have got a glimpse of the Data Ingestion Layer. This layer’s responsibility is to gather both stream and batch data and then apply any processing logic as demanded by your chosen use case. The following figure will refresh your memory and give you a good pictorial view of this layer:

Figure 01: Data Lake - Data Ingestion Layer

In our Data Lake implementation, the Data Ingestion Layer is responsible for consuming the messages from the messaging layer and performing the required transformation for ingesting them into the Lambda Layer (batch and speed layer) such that the transformed output conforms to the expected storage or processing formats. The Data Ingestion Layer must ensure that the rate of message consumption is always better or equal to the message ingestion rates, such that there is no latency to process the messages/events.

Some of the characteristics of Data Ingestion Layer can be summarized as follows:

  • Less complex and really fast to cater to data input (in our case, output from the messaging layer)
  • Capable of handling different data flows (real-time or batch, continuous or asynchronous)
  • Capable of handling various data types (structured, unstructured, and semi-structured)
  • Integration with various persistence store mechanisms
  • Multiple transport protocol support
  • Capable of handling four V's of big data
  • Capable of connecting with disparate systems and technologies
Figure 02: Working of Data Ingestion layer in our Data Lake implementation

As shown in the preceding figure, we will take data from the messaging layer and will enrich and transform it accordingly to pass it to the Lambda Layer (both Speed and Batch Layer).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.13.192