Data publishing

The serving layer may also need to publish the processed data. This is more of a case with the output from near-real-time processing in which other downstream systems may be interested. Here the serving layer may play the role of an event hub. Most often it is good to expose such events over a topic such that multiple consumers could consume these events. However, a slow or unavailable consumer could potentially cause a pile up of messages on the data lake side. Hence this component should also be built for failure scenarios so that smooth recovery can be ensured, while keeping serving layer healthy and functional.

As discussed before as well, if we need to ever classify data based on Push or Pull, this can be classified as:

Push Pull

Data Exports

Data Publish

Relation Database Access
Tables/Views
NoSQL and Indexes via Data Service

For the purpose of this book, the various technologies that we will be considering to build a Data Serving Layer are the following:

Data Serving Layer Component Technology
Relational Database PostgreSQL
Tables and Views Hive, Impala
Indexes Elasticsearch
NoSQL Database HBase, Couchbase
Data Services Spring Boot Service
Data Exports Hadoop MapReduce, Sqoop, Pig Scripts
Data Publish JMS, Kafka
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.106.135