Kappa architecture

The idea behind Kappa architecture is based on the notion that the entire dataset is a stream that can be read any number of times by the underlying system to perform computations on it. The data store in a Kappa architecture is an append-only immutable logging system. Data is read from this immutable store by various computational systems that perform computations on the data in a Directed manner and this computed data ends up in a Serving store where the queries get executed.

You may have noticed that Kappa architecture is a simplification of the lambda architecture, where we have simply removed the entire Batch Layer and replaced everything with a Streaming layer.

At the center of Kappa architecture is the immutable data log. This is similar in concept to the immutable Master Dataset in Lambda architecture, but instead of using technologies such as Hadoop/HDFS, Kappa architecture's immutable data log is (usually) Kafka. Apache kafka is a system developed at LinkedIn that lets you retain the full log of the data you need to reprocess.

Stream-processing jobs simply read the data from Kafka and process them. When reprocessing is required, a second instance of the streaming job is executed that starts processing the data from the beginning of the retained data and redirects the output to a separate table. When the second job that was executed has caught up with the entire dataset, simply switch the application to read from the new data view, stop the first job, and delete the data view of the first job.

This is the diagrammatic representation of Kappa architecture, representing Kafka as the immutable log system:

In Kappa architecture, reprocessing is required only when your processing logic needs modification and you want to recompute your results. And to speed up the overall processing, you can spin up multiple consumers in parallel consuming only part of the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.176.68