Discussing Data-Centric Architectures

So far, we've discussed what a Data Ecosystem looks like, the guiding principles of a Data-Centric architecture, and the application styles and architectural patterns most relevant in Data-Intensive Systems. One of the things that we have mentioned, mostly implicitly, is that Data-Centric architectures are distributed in Nature.

Put simply, a Distributed System is a collection of inter-related components, each having a very specific responsibility, that communicate with each other via a well-defined protocol (such as HTTP or MQ), running on multiple machines (either virtual or physical) that, from the outside, look like a single system.

From this definition of a Distributed System, we can pick up few characteristics of a Distributed System:

The different components in a Distributed System have a well-defined single functional responsibility that they perform well and then forward the output of performing the responsibility to the next component in line.
The different components in a Distributed System can and usually do work concurrently. What this means is that work can be performed, at the same time, by two different components within the Distributed System.
Each component can go down independently of the other. What this means is that if Component A forwards a message to Component B, and if Component B is not running, it should not affect Component A's ability to perform its work and cater to the next incoming request. This is achieved by decoupling different components by bringing in a durable messaging system.
The preceding point also tells us that the computation within a Distributed system can be both synchronous and asynchronous, in many cases it is asynchronous, although at different levels of granularity. For example, when you are working with HBase and the HBase Client makes a Request to one of the HBase region servers, that call could be either synchronous or asynchronous (depending upon whether you allow stale reads or not). The incoming call from the end user to the HBase Client library could also be either synchronous or asynchronous.

In this chapter, we will cover the following:

Functional components that make the foundation of a Distributed System
Users insight into the Distributed nature of a Data-Intensive System, talking about distributed and reliable messaging, Distributed Processing, and Distributed Storage
Lambda architecture and why it is so popular with Distributed Systems
Kappa architecture, which is essentially a scaled-down (or simplified) version of Lambda architecture
The various NoSql data stores in the market today and how they differ from each other

When designing a Distributed System, we have to think and apply Distributed Principles across all of the architecture layers. Our storage solution needs to be Distributed to accommodate failures, high-availability, fault-tolerance, and so on. Our computation or business layer needs to be distributed to scale up or down based on the load and again to prevent situations such as a single component failure resulting in the entire system failing.

Now, in such a distributed architecture, you need someone or something to make sure that coordination happens properly within the nodes of a distributed System.

Table of Contents for Discussing Data-Centric Architectures

Create new playlist

Sign In

Sign Up

Table of Contents for
Discussing Data-Centric Architectures