Applying best practices for disaster recovery

As you start thinking about disaster recovery, here are some important considerations:

  • Start small and build as needed: Make sure to streamline the first step of taking a backup. Most of the time, organizations lose data as they didn't have an efficient backup strategy. Take a backup of everything, whether it is your file server, machine image, or databases. 

Keeping lots of active backups could increase costs, so make sure to apply a lifecycle policy to archive and delete data as per business needs. For example, you can choose to keep a 90-day active backup and after that store that in low-cost archive storage such as a tape drive or Amazon Glacier. After 1 or 2 years, you may want to set a lifecycle policy to delete the data. Compliance such as PCI-DSS may require users to store data for 7 years, and in that case, you must choose archival data storage to reduce costs.

  • Check your software licenses: Managing software licenses can be a daunting task, especially in the current microservice architecture environment, where you have several services running independently on their instances of virtual machines and databases. Software licenses could be associated with several installations, a number of CPUs, and several users. It becomes tricky when you go for scaling. You need to have enough licenses to support your scaling needs. 

Horizontal scaling needs to add more instances with software installed, and in vertical scaling, you need to add more CPU or memory. You need to understand your software licensing agreement and make sure you have the appropriate license to fulfill system scaling. Also, you don't have to buy an excessive license, which you may not be able to utilize and costs more money. Overall, make sure to manage your license inventory like your infrastructure or software.

  • Test your solutions often: Disaster recovery sites are created for rare disaster recovery events and are often overlooked. You need to make sure your disaster recovery solution is working as expected in case of an event to achieve higher reliability. Compromising a defined SLA can violate contractual obligations and result in the loss of money and customer trust. 

One way to test your solution often is by playing gameday. To play gameday, you can choose a day when the production workload is smaller and gather all of the team responsible for maintaining the production environment. You can simulate a disaster event by bringing down a portion of the production environment and let the team handle the situation to keep the environment up and running. These events make sure you have working backups, snapshots, and machine images to handle disaster events.

Always put a monitoring system in place to make sure automated failover to the disaster recovery site takes place if an event occurs. Monitoring helps you to take a proactive approach and improves system reliability by applying automation. Monitoring capacity saves you resource saturation issues, which can impact your application's reliability. Creating a disaster recovery plan and performing regular recovery validation helps to achieve the desired application reliability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.188.160