The serving layer may also need to publish the processed data. This is more of a case with the output from near-real-time processing in which other downstream systems may be interested. Here the serving layer may play the role of an event hub. Most often it is good to expose such events over a topic such that multiple consumers could consume these events. However, a slow or unavailable consumer could potentially cause a pile up of messages on the data lake side. Hence this component should also be built for failure scenarios so that smooth recovery can be ensured, while keeping serving layer healthy and functional.
As discussed before as well, if we need to ever classify data based on Push or Pull, this can be classified as:
Push | Pull |
Data Exports Data Publish |
Relation Database Access |
For the purpose of this book, the various technologies that we will be considering to build a Data Serving Layer are the following:
Data Serving Layer Component | Technology |
Relational Database | PostgreSQL |
Tables and Views | Hive, Impala |
Indexes | Elasticsearch |
NoSQL Database | HBase, Couchbase |
Data Services | Spring Boot Service |
Data Exports | Hadoop MapReduce, Sqoop, Pig Scripts |
Data Publish | JMS, Kafka |