Data ingestion via Flink to HDFS and Elasticsearch

As the events are queued into the respective Kafka topics, the Flink processing pipeline gets triggered and starts consuming Kafka events from these topics.

Taking as a reference the Flink example covered in an earlier chapter, we build two pipelines here, one for address and the other for contacts. Both of these pipelines would stream the events into two sinks, HDFS and Elasticsearch, respectively so that both of these ingestions are part of the same transaction.

Building over the earlier Flink example, which we ran from IDE, we would now package it in such a way that we can also deploy the code in the Flink container. This aspect of Flink deployment is new and not covered in the earlier chapters.

In order to achieve this, let's start looking at some of the key elements of the Flink pipeline project. The complete source code for this project can be found in chapter10/speed-address-flink-ingestor (address Flink pipeline) and chapter10/speed-contacts-flink-ingestor (contacts Flink pipeline). In order to simplify, we will be discussing the entire code section by section.

Table of Contents for Data ingestion via Flink to HDFS and Elasticsearch

Create new playlist

Sign In

Sign Up

Table of Contents for
Data ingestion via Flink to HDFS and Elasticsearch