Fault-tolerance and redundancy

In the previous section, you learned that fault tolerance and high availability have a close relationship with each other. High availability means your application is available for the user, but maybe with degraded performance. Suppose you need four servers to handle a user's traffic. For this, you put two servers in two different physically isolated data centers. If there is an outage in one data center, then user traffic can be served from another data center. But now, you have only two servers, which means you're left with 50% of the original capacity, and users may experience performance issue. In this scenario, your application has 100% high availability but is only 50% fault tolerant.

Fault tolerance is about handling workload capacity if an outage occurs without compromising system performance. A full fault-tolerant architecture involves high costs due to increased redundancy. It depends on your application's criticality as whether your user base can live with degraded performance for a period of application recovery:

Fault-tolerance architecture

As shown in the preceding diagram, your application needs four servers to handle the full workload by distributing them into two different zones. In both scenarios, you are maintaining 100% high availability.

To achieve 100% fault tolerance, you need full redundancy and have to maintain the double count of the server so that the user doesn't encounter any performance issues during the outage of one zone. By keeping the same number of servers, we will achieve only 50% fault tolerance.

While designing the application architecture, a solution architect needs to determine the nature of the application's user base, only design for 100% fault tolerance (as required) and offset any redundancy costs involved.

Table of Contents for Fault-tolerance and redundancy

Create new playlist

Sign In

Sign Up

Table of Contents for
Fault-tolerance and redundancy