Fault tolerance

In an enterprise distributed system, there is a high possibility of error generation due to any mechanical or algorithmic issue. It is defined as a system fault that leads to failure on executing the underlying applications.

In general, faults are classified into three major categories. They are as follows:

  • Transient fault
  • Intermittent fault
  • Permanent fault

Transient faults occur once and then disappear, so it is very difficult to reproduce and resolve the bug. Let me provide a simple example. In a network messaging process, it may be possible to loose data connectivity, and it is pretty hard to reproduce the exact situation. This characteristic is considered the key factor to categorize transient faults.

Intermittent fault repeats multiple times with the characteristics of an occurring fault, and then it disappears for a while, then it reoccurs, and then it disappears, and so on. As a side effect, an intermittent fault is considered the most annoying of component faults in the underlying system. In a similar network example, loose connectivity is the classic use case to illustrate an intermittent fault type.

Permanent fault has a persistent characteristic, so these types of fault continue to exist until the faulty component is repaired or replaced. To continue to our network use case, the physical corruption of network cable is the suitable example. Unless the damaged network cable is fixed or replaced properly, the enterprise application is not suitable to proceed. Further examples of this fault are disk head crashes, software bugs, and burned-out power supplies.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.239.48