Summary

In this chapter, you learned about various principles to make your system reliable. These principles include making your system self-healing by applying rules of automation and to reduce the impact in the event of failure by designing a distributed system where the workload spans multiple resources.

Overall system reliability heavily depends on your system's availability and its ability to recover from disaster events. You learned about synchronous and asynchronous data replication types and how they affect your system reliability. You learned about various data replication methods, including array-based, network-based, host-based, and hypervisor-based. Each replication method has its pros and cons. There are multiple vendors' products available to achieve the desired data replication.

You learned about various disaster planning methods as per the organization's needs and the RTO and RPO. You learned about backup & restore method that has high RTO and RPO, and it is easy to implement. Pilot light improves your RTO/RPO by keeping critical resources such as a database active in the disaster recovery site. Warm standby and multi-site maintain an active copy of a workload on a disaster recovery site and help to achieve a better RTO/RPO. As you increase application reliability by lowering the system's RTO/RTO, the system's complexity and cost increase. You learned about utilizing the cloud's built-in capability to endure application reliability.

Solution design and launch may not happen too often, but operational maintenance is an everyday task. In the next chapter, you will learn about the alert and monitoring aspects of solution architecture. You will learn about various design principles and technology choices to make your application operationally efficient and apply operational excellence.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary