Operational excellence and maintainability

Operational excellence can be a great differentiator for your application by providing an on-par service to customers with minimal outage and high quality. It also helps the support and engineering team to increase productivity by applying proactive operational excellence. Maintainability goes hand in hand with operational excellence. Easily maintainable applications help reduce costs, avoid errors, and let you gain a competitive edge.

A solution architect needs to design for operation, which means the design should include how the workload will be deployed, updated, and operated in the long run. It is essential to plan for logging, monitoring, and alerting to capture all incident and take quick actions for the best user experience. Apply automation wherever possible, whether deploying infrastructures or changing the application code to avoid human error.

Including deployment methods and automation strategy in your design is very important as this can accelerate the time to market for any new changes without impacting existing operations. Operation excellence planning should consider security and compliance elements as regulatory requirements may change over time and your application has to adhere to them to operate.

Maintenance can be proactive or reactive. For example, once a new version of an operating system becomes available in the market, you can modernize your application to switch platforms immediately or monitor system health and wait until the end of the life of the software before making any changes. In any case, changes should be in small increments with a rollback strategy. To apply these changes, you can automate the entire process by setting up the continuous integration and continuous deployment (CI/CD) pipeline. For the launch, you can plan for A/B deployment or blue-green deployment.

For operational readiness, architecture design should include appropriate documents and knowledge-sharing mechanisms— for example, creating and maintaining a runbook to document routine activity and creating a playbook that can guide your system process through issues. This allows you to act quickly in the event of an incident. You should use root cause analysis for post incidence to determine why the issue occurred and make sure it doesn't happen again.

Operational excellence and maintenance are an ongoing effort; every operational event and failure is an opportunity to learn and help you improve your operation by learning from previous mistakes. You must analyze the operation's activities and failures, do more experimenting, and make improvements. You will learn more about performing excellent consideration in solution design in Chapter 10, Operational Excellence Considerations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.72.224