Making systems self-healing

System failure needs to be predicted in advance, and in the case of failure incidence, you should have an automated response for system recovery, which is called system self-healing. Self-healing is the ability of the solution to automatically recover from failure. A self-sealing system detects failure proactively and responds to it gracefully with minimal customer impact. Failure can happen in any layer of your entire system, which includes hardware failure, network failure, or software failure. Usually, data center failure is not an everyday event, and more granular monitoring is required for frequent failures such as database connection and network connection failures. The system needs to monitor the failure and act to recover.

To handle failure response, first, you need to identify Key Performance Indicators (KPIs) for your application and business. At the user level, these KPIs may include the number of requests served per second or page load latency for your website. At the infrastructure level, you can define maximum CPU utilization, such as it should not go above 60%. Memory utilization should not go beyond 50% of the total available Random-Access Memory (RAM) and so on.

As you defined your KPIs, you should put the monitoring system in place to track failures and notify you as your KPIs reach the threshold. You should apply automation around monitoring so that the system can self-heal in the event of any incidents. For example, add more servers when CPU utilization reaches near 50%—proactive monitoring helps to prevent failures.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.34.146