Disaster recovery and disaster avoidance

A disaster is any event that halts business activity on a large scale. In most cases, we are talking about natural disasters, but there are also human-made disasters, and all of them can happen at any time without warning.

These disasters could impact technologies and IT services. Of course, there are other and more critical aspects, such as risk to human life, but for Business Continuity(BC), we put the main focus on business-critical services and applications.

DR provides BC in the event of a disaster, and may be just localized on equipment (such as a single server) or globally on an entire site. Business will be recovered by following a specific DR plan; it is just a subset of the Business Continuity Plan (BCP). In the case of a disaster that impacts an entire site or region, usually, the recovery process uses a remote location called a disaster recovery site.

DR is essential to ensure the continuation of business after a disaster. DR can also be required in several regulatory compliances. Effective DR is a critical part of a BCP must address the following three organizational requirements:

  • Minimize risks: Having a BC plan does not eliminate all risks if you cannot be sure that the plan is reliable or practicable. The DR plan could be difficult to implement, and for several organizations, it may have some business impacts and possible risks.
  • Minimize downtime: The consequences of extended downtime can be critical for business, recognition, and productivity. For most companies, a service disruption of ten or more days could be a total disaster and lead to the company closing.
  • Control costs: Traditional disaster recovery plans are often limited in scope because of the cost, but you must find a trade off between costs and risk mitigation.

Although everyone realizes the importance of a DR plan, some organizations do not have the proper level of DR protection that they need. Only after a real disaster do they fully understand the importance of DR and the real impact that a disaster can have on their business.

Legacy solutions and processes to activate applications in the DR site usually require complex runbooks and manual procedures to execute the failover process. They may require highly specialized staff with vertical skills, large time investments, and high levels of coordination from several teams that are responsible for different layers of the infrastructure.

The main challenges that must be handled by DR in order to have a successful and effective plan are as follows:

  • Complexity: Usually data center recovery plans are complex processes because, to guarantee the correct recovery of entire business services, they must deal with all the inter-dependencies between applications, hosts, networks, storage, and other infrastructural and organizational aspects.
  • Lack of predictable and reliable recovery: Recovery plans documented in run books can be incomplete and may quickly fall out of sync with rapidly evolving deployments. Most enterprises test their recovery plan only twice a year or less.
  • High cost: Legacy DR solutions require significant capital and operating expenditures. The DR site typically requires a dedicated duplicate server infrastructure. As defined by Gartner in Survey Analysis: IT Disaster Recovery Management Spending and Testing Activities Expand in 2012, July 2012:
"The net result is that legacy disaster recovery solutions are regarded as non-strategic and costly insurance policies with very questionable returns. At best, only a few mission-critical applications get the privilege of site-level protection."

Two of the fundamental parameters that characterize a BC/DR plan are the Recovery Point Objective (RPO) and Recovery Time Objective (RTO):

  • RPO refers to how much data you can lose. RPO usually refers to the time that has elapsed between last backup and the failure.
  • RTO refers to how long it will take to recover from the failure. Usually, the duration of the restore procedure is referred as Recovery Time Objective.

In a perfect world, you would want to have those numbers near zero, but the cost of such a solution would be extremely high. In the real world, you need to find a balance between the cost of a BC/DR solution versus the potential risks of failure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.83.8