Building for failure

No software can be 100% error-free, but we always try to make it as stable as possible while ensuring that if an error occurs, it will be handled gracefully. You must have heard the term build for failure. The idea is that you cannot assume that your applications or services will never fail. Instead, you should assume that they will fail no matter how carefully you have built and tested your services.

There are different types of failures. A common failure case is when the developer has missed handling some edge condition such as an invalid value received as input, or a memory leak due to too many unused objects, or your application is facing more load than expected. Then we can have hardware failures, where a server or a cloud node goes down. We cannot handle all issues beforehand, so we need to plan for failure cases and design the architecture so that our overall service is stable even if an issue occurs.

A cascading failure would look as follows:

There are techniques such as an implementation of the circuit breaker pattern,the fail fast pattern, the fan out pattern, and so on, which can help us manage this kind of failure.We will discuss these later in this chapter in the Handling the failure section.

So far, we've discussed why it is important to know that there are various best practices and patterns that we can (and should) follow in order to make our services robust and ready for failure. We will discuss the important practices in the rest of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.189.228