Fault tolerance

In an enterprise distributed system, there is a high possibility of error generation due to any mechanical or algorithmic issue. It is defined as a system fault that leads to failure on executing the underlying applications.

In general, faults are classified into three major categories. They are as follows:

Transient fault
Intermittent fault
Permanent fault

Transient faults occur once and then disappear, so it is very difficult to reproduce and resolve the bug. Let me provide a simple example. In a network messaging process, it may be possible to loose data connectivity, and it is pretty hard to reproduce the exact situation. This characteristic is considered the key factor to categorize transient faults.

Intermittent fault repeats multiple times with the characteristics of an occurring fault, and then it disappears for a while, then it reoccurs, and then it disappears, and so on. As a side effect, an intermittent fault is considered the most annoying of component faults in the underlying system. In a similar network example, loose connectivity is the classic use case to illustrate an intermittent fault type.

Permanent fault has a persistent characteristic, so these types of fault continue to exist until the faulty component is repaired or replaced. To continue to our network use case, the physical corruption of network cable is the suitable example. Unless the damaged network cable is fixed or replaced properly, the enterprise application is not suitable to proceed. Further examples of this fault are disk head crashes, software bugs, and burned-out power supplies.

Table of Contents for Fault tolerance

Create new playlist

Sign In

Sign Up

Table of Contents for
Fault tolerance