Data publishing

The serving layer may also need to publish the processed data. This is more of a case with the output from near-real-time processing in which other downstream systems may be interested. Here the serving layer may play the role of an event hub. Most often it is good to expose such events over a topic such that multiple consumers could consume these events. However, a slow or unavailable consumer could potentially cause a pile up of messages on the data lake side. Hence this component should also be built for failure scenarios so that smooth recovery can be ensured, while keeping serving layer healthy and functional.

As discussed before as well, if we need to ever classify data based on Push or Pull, this can be classified as:

Push

Pull

Data Exports

Data Publish

Relation Database Access
Tables/Views
NoSQL and Indexes via Data Service

For the purpose of this book, the various technologies that we will be considering to build a Data Serving Layer are the following:

Data Serving Layer Component	Technology
Relational Database	PostgreSQL
Tables and Views	Hive, Impala
Indexes	Elasticsearch
NoSQL Database	HBase, Couchbase
Data Services	Spring Boot Service
Data Exports	Hadoop MapReduce, Sqoop, Pig Scripts
Data Publish	JMS, Kafka

Table of Contents for Data publishing

Create new playlist

Sign In

Sign Up

Table of Contents for
Data publishing