Design principles for reliability

People are moving toward the cloud because it guarantees to provide you with a more reliable system compared to an in-house setup. Cost is one of the major factors in achieving the right amount of reliability:

Test recovery procedures
Automatically recover from failure
Scale horizontally to increase aggregate system availability
Stop guessing capacity
Manage change using automation

The following gives more information about the previous points:

Test recover procedures: Practice recovery procedure for your services data so that you can handle any incident and recover your system in a short period of time to make your system more reliable.
Automatically recover from failure: Monitor your key performance indicators (KPI) to automatically take action during failure, for quick recovery.
Scale horizontally to increase aggregate system availability: Scaling horizontally is fast in comparison to vertical scaling, and the horizontal scaling cost is going linear, whereas vertically scaling cost goes exponential.
Stop guessing capacity: Provide data and facts to your system so that it can take a decision to scale instead of guessing your customer traffic. Guessing is a short-term temporary fix, and this situation comes when people don't have metric collection about their services. In the absence of data/facts, if you choose to guess the load, it leads to failure and decreases reliability. We see more unreliable infra during DDoS attacks on a guessed system compared to a fact/data-based scaled system.
Manage change using automation: Avoid human intervention in change implementation, and try to automate this process as much as possible.

Some time back, I was working to achieve reliability for my company, and I developed a project reliability maturity KPI matric that was very useful for telling people about the maturity or reliability implementation in the project using various KPI categorized under it.

Table of Contents for Design principles for reliability

Create new playlist

Sign In

Sign Up

Table of Contents for
Design principles for reliability