Disaster Recovery Planning (DRP)

Disaster recovery is a process that enables the business to recover from an event that affects the normal business operations for a prolonged period of time.

At this point, I would like to highlight the similarities and differences between BCP and DRP processes:

  • Both BCP and DRP are targeted at continuity or the resumption of business processes, as the case may be.
  • Both the processes address the actions to be taken when an incident happens or a disruptive event strikes.
  • BCP focuses on the continuity of business processes. For example, power failure is an incident. It is not a disastrous event. BCP will address this using continuity processes such as an Uninterrupted Power Supply (UPS) system or a power generator. However, BCP focuses on the continuity of the business processes from the holistic perspective of the business itself.
  • DRP focuses on recovery procedures due to disastrous events. For example, earthquake strikes the location. This is not the same situation as a power failure. Even having a UPS or a generator is not going to be helpful. DRP will address this by resuming the critical business processes from an alternative site.

Disaster Recovery Planning (DRP) is a process for the following:

  1. Developing procedures that define the actions to be taken during and after disastrous event.
  2. Testing the procedure for effectiveness.
  3. Updating the procedures to reflect the lessons learned from the testing process.

Goals and objectives

The goal of disaster recovery planning is to effectively manage the operations during disaster and ensure a proper coordination of different teams.

The objective of disaster recovery planning is to continue the business/IT operations in a secondary site during disaster and restore back to the primary site in a timely manner.

Components of disaster recovery planning

Some of the components of disaster recovery include these:

  • The identification of suitable teams to coordinate the recovery process
  • The resumption of business from alternate sites or recovering data from a backup
  • Communications with employees, external groups, and media
  • Financial management including insurance

Recovery teams

In disaster recovery, various teams play important roles. The most important teams are as follows:

  • The recovery team: On the declaration of a disaster, this team is entrusted with implementing the recovery procedures
  • The salvage team: This team will be responsible for returning business operations to the primary site

Recovery sites

A primary site is the one where normal business operations, including IT operations, take place.

A secondary site is referred to as a backup to the primary site. Generally, secondary sites are geographically located in a different region.

Business resumption from alternative sites

The following are some of the disaster-recovery activities that are related to continuing business from an alternative site.

A reciprocal agreement

A reciprocal agreement is an arrangement with another company having additional computing facilities that can be utilized during contingency. The term reciprocal implies that it is a mutual agreement that both the companies may utilize the computing facilities of the other in the event of a disaster. However, such agreements are not legally binding, as it is a simple arrangement.

Subscription services

This means paying or subscribing to facility management services that use third-party backups and processing facilities. This type of arrangement is called a subscription service:

Note

A type of subscription that services a company may be contracted based on the Business Impact Analysis (BIA), Recovery Time Objectives (RTO), and Recovery Point Objectives (RPO).

BIA, RTO and RPO are explained in the previous chapter.

  • A hot site is an alternate backup site that is fully configured with computer systems, Heating, Ventilation, and Air Conditioning (HVAC), and power supply. This site also contains all the applications as well as the data to commence the operations immediately. Hot sites are highly expensive. Typically, a business operation that needs to be resumed within 24 hours would consider a hot site.
  • A cold site contains no computer or other computing equipment. Only HVAC and power are available here. The computers, the computing equipment, applications, and data need to be installed before commencing the operations. Cold sites are the least expensive. Typically, a business operation that can be resumed in a span of a week to 10 days would consider this option.
  • A warm site is in between hot and cold sites. In this type of arrangement, the computing facilities such as computers, other communication elements, HVAC, and power are available. However, applications and data need to be installed before commencing the operations. This type of site is less expensive than a hot site. Typically, a business operation that needs to be restored within a span of 24 hours to 96 hours would consider this option.
  • Dual sites refer to mirroring the exact operations and data in alternative sites. From the recovery perspective, this type of sit is instantaneous in business resumption. However, they are very expensive to maintain. Typically, business operations that cannot afford any downtime at all would consider this option.

Backup terminologies

The following concepts are applicable to hot sites and dual sites in terms of backup and restorations:

  • Electronic vaulting is a batch process used to dump the data at periodical intervals to a remote backup system.
  • Remote journaling is a parallel processing system that writes the data in a remote system at the alternate site. This type of backup is used where the RTO is less and a high degree of fault tolerance is required.
  • Database shadowing is used to duplicate the data into multiple sites from the remote journaling process. This type of system is used where the fault-tolerance requirement is of the highest degree.

Testing procedures

Disaster recovery plans should include various testing procedures so that the plans can be tested for adequacy and correctness. The lessons learned from such tests can be incorporated into the plans for better preparedness during a disaster.

The following are some of the industry standard tests pertaining to disaster recovery planning processes:

  • A checklist review is a review process for checking the disaster recovery plan by managers of various business units. The following table shows a general checklist. This list is at the macro level. Further lists should be generated at micro level to drill down to finer details:
Testing procedures
  • A structured walk-through is a tabletop exercise in which a management team of various business units meets and reviews each and every step in a sequential manner. Any deficiencies or missing steps are discussed and updated in the plan. While a checklist review is used to check the availability of the resources such as documents, systems, people, communication facilities, backup, and more, the structured walk-through checks the recovery processes step by step over a tabletop review.
  • A simulation test is a testing process used to simulate the event in a testing environment. This test is expected to provide vital inputs from the actions of various response teams, and any deficiency can be corrected, including the training requirements. This type of test is also called a walk-through test or drill. A simulation test is more comprehensive than a tabletop exercise.
  • A parallel test is a testing process used to test the coordination of other essential groups such as medical, fire services including internal teams, and the adherence to communication procedures. This type of test is used for testing the functionality of the plans. Hence, it is referred to as a functional drill.
  • A full test is a type of test that tries to simulate a real emergency or a disaster event. This test involves the participation of all the associated teams and groups as well as a real shutdown of the primary site, and the commencement of operations from the remote site.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.196.172