Elements of a BCP

BCPs are large, comprehensive documents. They include many elements and often cover many contingencies. A single format will not cover all requirements for all organizations. However, some guides suggest the inclusion of certain elements.

Business Continuity Versus Disaster Recovery

The terms business continuity (BC) and disaster recovery (DR) are not synonymous but rather two separate but related processes, even though some people confuse them and use the terms interchangeably. BC covers all functions of a business to ensure that the entire business can continue to operate in the event of a disruption and includes a BIA and DRP as attachments to the BCP. DR is largely a function of IT and includes the elements necessary to recover from a disaster, such as backups, recoveries, and restores. DR can also be broader and include elements such as alternate sites. However, the DRP is a part of the larger BCP.

For example, a bank operates on a coast threatened by hurricanes. Customers use the bank’s website to do online banking. If a hurricane hits, the bank wants to ensure that its website will continue to be available. Therefore, the BCP could include a DRP that includes the steps needed to recover the website at an alternate location.

Weather services provide warnings at certain hourly periods, such as a warning being issued when a hurricane is expected to hit within 72 hours. The BCP could designate specific steps at 72 hours, 48 hours, and 24 hours, and the DRP would provide details on how to move the website to an alternate location.

The BCP may address other nontechnical elements of the hurricane. For example, when should the bank close? When should the vault be locked? Should guards remain inside? Are any other security precautions necessary?

After the disaster has passed, the BCP and the DRP would have different goals, but they would work together. Again, the BCP is focused on getting the overall business functions back to normal, and the DRP is focused on restoring and recovering IT functions.

For example, the following sections are often included in a BCP:

  • Purpose
  • Scope
  • Assumptions and planning principles
  • System description and architecture
  • Responsibilities
  • Notification and activation phase
  • Recovery phase
  • Reconstitution phase
  • Plan training, testing, and exercises
  • Plan maintenance

The following sections describe the contents of these sections.

Purpose

The purpose of the BCP is to ensure that mission-critical elements of an organization continue to operate after a disruption, which can be any event that has the potential to stop operations. The BCP is implemented when a disruption occurs or is imminent. The BCP then stays in place until the restoration of normal operations.

Only CBFs are maintained during the disruption, which means doing business as usual does not continue during a disaster. The BIA identifies the CBFs and their priorities, and the BCP ensures that all the elements are in place to maintain those CBFs.

The BIA also includes acceptable outage times. Some CBFs may need to be kept operational with minimal outage, whereas other CBFs may have lower priorities. Depending on the recovery time objectives identified in the BIA, these lower-priority CBFs may be down for hours or even days.

Scope

Just as with any project, the scope of the BCP needs to be defined because the success of the project is dependent on personnel understanding the tasks. If there is no scope statement, two problems can occur. First, the desired tasks won’t be finished, which means the BCP will be incomplete. Second, scope creep can occur. Scope creep happens when the project keeps taking on additional tasking. For example, the intended scope may cover a single location, but instead the BCP includes research and recommendations for five locations.

The scope statement can include several key items, such as the location, the systems, the employees, and the vendors. Only the critical systems identified in the BIA should be included. Employees who are necessary to support the critical systems should be included and identified by title or position, rather than name. If parts or supplies can be obtained from a specific source only, then the vendor should be included in the BCP.

Although a BCP will take a global view of the organization, it doesn’t have to cover the entire organization. The BCP could cover only specific locations. For example, a company has a main office in Atlanta and regional offices in Chicago, Los Angeles, and Miami. It could create four separate BCPs, one for each location.

Individual departments or divisions may have smaller threats that need to be addressed from a business continuity perspective, but, generally, these departments use contingency planning or redundancy measures. The smaller threats wouldn’t be addressed with separate BCPs.

Assumptions and Planning Principles

Every BCP needs to include some basic assumptions and planning principles that are very helpful in the initial development of the BCP and also useful in the implementation phases.

A key planning principle is the length of time that the company is expected to continue operations under the BCP before returning to normal operations. For example, a company could plan on continuing operations under the BCP for seven days following a hurricane. Seven days then becomes a guiding principle for many other elements, such as seven days of supplies, seven days of fuel for generators, or seven days of food and water for personnel.

TIP

These assumptions and principles will drive much of the decision-making process, so they need to be accurate. For example, if the company assumes that it will never lose power, it won’t plan for alternate power sources. However, if a disruption does result in a power loss, the rest of the plan would be useless.

Assumptions and principles can be reviewed and assessed in several different categories, which include the incidents to be addressed in the plan and elements such as strategy, priorities, and required support. The following sections provide guidance for these areas.

Incidents to Be Included and Excluded

Many BCPs identify specific incidents that are included and excluded. For example, the BCP may be designed to address specific disruptions due to hurricanes or earthquakes or to address generic incidents, such as power loss from any cause.

As an example, an organization is in a hurricane-prone area. It will have many hours if not days of advance notice before the hurricane hits. Safety precautions and preparedness steps are well known for hurricanes, such as securing anything that can blow away and preparing for wind damage and, for some areas, flooding.

Now, compare this organization to one in an area prone to earthquakes. The organization doesn’t receive any notice that an earthquake is about to hit. One moment everything is calm, and the next, buildings are collapsing. Depending on the severity of the quake, damage can range from very little to mass destruction.

The responses to each of these incidents are very different. Therefore, knowing what incidents the BCP will be used to respond to will provide a better idea of what steps to include.

On the other hand, some incidents are generic. For example, the BCP could include plans to provide backup generator power to the organization if power is lost, which could be used no matter how the power is lost. Plans could also be included to relocate to another location if one location is not usable. Again, why the location is unusable doesn’t matter. The cause could be a fire, a flood, a tornado, an earthquake, or something else.

Strategy

The strategy of the BCP identifies some of the key elements of the plan, which could include location, notification, and transportation.

If an organization is in a single location, the strategy would be to address this single location, whereas, if the organization is in several locations, a strategy would need to be identified for each one. For example, an organization may decide that key IT resources will be centrally located and maintained, such as in a large university that has many buildings spread across an expansive campus. Instead of maintaining operations at each building, the BCP could identify specific resources to be maintained at one or more central buildings.

If an incident has occurred or is imminent, BCP team members will need to be notified. Therefore, identifying how to notify all key players is important. One such way is using a phone tree. With a phone tree, one person starts the process by calling several key people, such as team leads. Then, the teams leads call people on their lists and so on until, eventually, everyone has been notified.

Transportation may be a concern. If members will need transportation from one area to another, the BCP should address it. For example, company vehicles could be designated for shuttling personnel as needed. Or, if equipment needs to be moved, how that will happen should be identified in the BCP.

Supplies are an important concern if they are needed for continued operation. For example, if the company will need supplies to continue producing a product, they must be available. Utilities, such as water, power, and gas, are needed. If the strategy is to continue to operate on the BCP plan for seven days, the company will need to provide its own utilities for seven days.

Communications is another concern because typical communications are often interrupted during a disruption. Land-based phone lines may not function and, in some situations, neither will cell-based phones. Many organizations use push-to-talk cell phones, which work as cell phones and walkie-talkies. Even if the cell phone functions stop working during an emergency, the walkie-talkie function still works.

The benefit of the push-to-talk phones is that they don’t require external resources for the walkie-talkie functions because they simply broadcast on a frequency. As long as the other cell phone is in reach, it receives the message. One approach is to purchase many of these push-to-talk phones and issue them to key players during the first phase of the plan.

Priorities

The BIA identifies CBFs and critical resources and their priorities. Generally, the BCP reaffirms these priorities and will ensure that efforts focus on returning the top-priority systems first. These top-priority systems will have the most resources dedicated to restoring them.

Required Support

The BCP requires support during every phase, most importantly, management’s support. Without this support, the required input and support from personnel and the required funding will not be available. Therefore, the BCP is doomed to fail without the support from top-level management.

Later in this chapter, responsibilities are listed for individuals and teams. Clearly, these teams must provide support to the BCP and be supported in their endeavors.

During the notification and activation phase, all personnel need to respond as quickly as possible. Some personnel may be identified as mission critical, and they will need to remain at the site during the emergency.

TIP

Having supplies on hand for continued production may conflict with other organizational principles. For example, many organizations use a just-in-time philosophy, in which parts and supplies arrive when needed. Stocking seven days of supplies at all times for the BCP ties up funds in inventory. In addition, some supplies need to be rotated to ensure they don’t expire or become outdated.

System Description and Architecture

The BCP identifies CBFs that need to remain operational during the disruption. Each of these CBFs has individual systems that support it. Therefore, having current descriptions and documentation on these systems is important. This documentation needs to be detailed enough to identify the critical system and the supporting architecture. If the documentation isn’t available or is out of date, maintaining and recovering the CBFs becomes much more difficult.

While the CBF systems are being documented for the BCP, elements that need to be addressed in the recovery plan can be identified. For example, documentation may show that a system must maintain connectivity via a wide area network (WAN) link to stay operational. If the plan doesn’t include an alternative, this WAN link becomes a single point of failure. For example, FIGURE 13-1 shows a WAN link connecting a database server at the headquarters office to a remote location. If the WAN link fails, the CBF fails. Therefore, the WAN link must stay operational to support the CBF.

A network diagram of two database servers connected through a WAN link.

FIGURE 13-1 Database servers connected via a WAN link.

FIGURE 13-2 shows the same two servers with a backup method of communication. The servers communicate via the WAN link the majority of the time. If it fails, the servers communicate via the modems. Although the modem link is substantially slower than the WAN link, it can still meet the organization’s needs during a disruption.

A network diagram of two database servers with a primary and an alternate connection methods.

FIGURE 13-2 Database servers with primary and alternate connection methods.

The following sections identify some common documentation that needs to be included with the BCP.

TIP

Ensure steps are taken to provide for families of employees, which is especially true if employees need to stay on-site during the disruption. Employees should never have to choose between taking care of their families or the organization in an emergency. Much of the world is currently going through a pandemic (COVID-19) as of the writing of this book, and most services have been shut down, including schools and offices. Individuals and families are homebound, and many people are working remotely while caring for loved ones.

Overview

The overview section provides a description of a CBF in big-picture terms. For example, following is a description of a critical database hosted at the headquarters location of an organization. Headquarters hosts the sales database on a database server, and this database is critical to several business functions:

  • Management personnel at headquarters use this database to identify and track sales throughout the company.
  • Ordering and production personnel use this database to order and track products shipped to stores.
  • The database tracks inventories within each store. Employees at any store query the database to determine whether an item is in stock locally or at another store.

Each store has a local database server that hosts a database for the store and records store sales, and the store databases synchronize with the headquarters database server once an hour. This information doesn’t provide details but does provide enough information to understand the big picture.

Functional Description

The functional description builds on the overview by providing more details of the systems.

NOTE

In this example, the database server hosted at headquarters is critical. As described, no indication is given that the store database servers are critical, which means that a store database server could fail without affecting the headquarters database. However, in another company, the database server at each store location may be critical. In that case, all of them could be included in the same BCP, or a BCP for each store could be created.

Many systems interact with other critical systems, so including figures is valuable whenever possible. As an example, FIGURE 13-3 shows a diagram for the database described in the overview section that displays how each of the outlying stores connects to headquarters over the WAN links.

A diagram showing a database server at the headquarters of organization connected to database servers at individual stores through WAN links.

FIGURE 13-3 Database servers connected between stores and headquarters.

The description would provide more details. For example, it would include the store names and the store locations and details on the WAN links, and, if there were redundant WAN links, it would describe them.

Details on the headquarters server are also important. This description should include the server name, the operating system, and the database application used. For example, the server could be running Windows Server 2019 with SQL Server 2019.

If the server includes any fault-tolerance capabilities, they would be mentioned here. For example, the database server might be in a failover cluster and include a redundant array of inexpensive disk (RAID) subsystem. A two-node failover cluster allows one server to fail without affecting the services provided by the database. With RAID, a system will continue to operate even if a drive fails.

Sensitivity of Data and Criticality of Operations

The BCP includes information on the sensitivity of the system’s data and details on the criticality of the system operations.

Any organization will have some secret or proprietary data. The organization must define classifications for this data because the classification determines the level of protection required for the data. Some data may be classified as private and used only within the organization, whereas other data may be public and freely available.

If the system houses data, the data must be protected according to its level of classification. With this in mind, the BCP needs to document the sensitivity of the data. In the midst of an emergency, security precautions aren’t at the forefront of everyone’s mind. However, if the sensitivity is documented in the BCP, people will know what precautions to take. For example, a database may collect customer information, including credit card data and sales data for all its stores. The organization could classify this data as private or proprietary. If the database server is moved, steps will need to be taken to protect the data during transit and at both the original and alternate locations. In the previous example of the data hosted on the headquarters database server, because this data includes sales data, the server likely holds customer data, including credit card information and actual sale amounts, which most organizations try to keep private whenever possible.

Criticality of operations identifies the impact if the IT service fails. Criticality is usually documented in the BIA but is repeated in the BCP so that it’s clear. The criticality can be defined in a simple statement. For example, the following statement could be used for the headquarters database server mentioned in the previous example:

If the database server fails, outlying stores will not be able to query the database for products. They won’t be able to verify the product is in their store or another store. If store servers are unable to synchronize with the headquarters server, they will queue the sales data on their systems until it can be sent.

Ordering of new products will be delayed because the sales of existing products will be unknown, and management will not have current data available, which can affect decisions on many levels.

Identifying Critical Equipment, Software, Data, Documents, and Supplies

The BCP should list all the critical components for the system for the following reasons: First, the BCP states clearly which components are needed for the CBF, and second, it provides a list that can be used to restore the system from scratch.

This list includes any equipment, such as servers, switches, and routers. Because the servers may need to be rebuilt from scratch, the BCP should list the operating system and any applications needed to support the system. If an image is used to rebuild servers, it will list the version number.

TIP

The primary objective of a security system is to protect confidentiality, integrity, and availability. The loss of any of these should be considered when documenting this section in the BCP. For example, what will happen if the organization loses the availability of data or a system?

Items on the list can include a database hosted on the system; any type of files, such as documents or spreadsheets; and any needed supplies. These supplies can be as simple as office supplies, such as printer paper and toner. For some systems, the list can include technical supplies, such as special oils for machinery or tools needed for maintenance.

Whenever possible, the location of these items should be included. Some organizations create “crash carts” that include all the components needed to rebuild a system. They include CDs or DVDs for operating systems, applications, or images and basic instructions for building or rebuilding systems.

Telecommunications

Required connectivity with other systems is an important element to document in the BCP. Connectivity can be from the internal network, the Internet, dedicated WAN lines, or phone lines.

External connections often use lines from telecommunications companies. Internet service providers (ISPs) typically provide more than just access to the Internet. They can also lease lines used for WANs and virtual private networks (VPNs).

Any required communication links should be documented. For example, if a database receives updates from other databases using VPN lines, that information should be included. Some systems have multiple communication lines for redundancy. If the system can operate without specific telecommunication lines, they should be identified along with the redundant connection.

Responsibilities

Responsibilities within a BCP should be assigned. Assigning responsibilities makes things clear to everyone concerned. When tasking has not been completed or is behind schedule, getting the project back on track is easier when those responsible are known.

Employees in the organization will fill specific roles in a BCP. These roles include the BCP program manager, the BCP coordinator, BCP team leads, and BCP team members.

The next section covers these roles and responsibilities and some of the other key personnel who may be included in the BCP.

BCP Program Manager

A BCP program manager (PM) usually manages multiple BCP projects within a large organization. For example, a large organization could have several locations and BCPs for each one. A BCP coordinator manages a single BCP and reports to the BCP PM. The BCP PM ensures that each BCP is progressing as expected. The PM can use traditional project management skills to manage these BCPs. For example, every BCP has a start date, milestones, and an end date for the development stage and will include dates for the reviews to start. These reviews will also have milestones and end dates.

The BCP PM is responsible for ensuring that each of the BCPs is on track. Depending on the hierarchy of the organization, the BCP PM may not have any authority over the individual BCP coordinators, which makes it necessary for the PM to have exceptional communication skills.

Other organizations may have a specific department of PMs who have specialized project management skills. Lead PMs oversee several other PMs. In this situation, the BCP PMs have direct authority over the BCP coordinators.

BCP Coordinator

The BCP coordinator is in charge of a specific BCP. This individual can have two roles depending on the stage of the BCP:

  • Before the BCP has been completed and activated, this person is responsible for developing and completing it.
  • When the BCP has been completed and activated, the BCP coordinator is responsible for declaring the emergency and activating the BCP.

When an emergency is declared, the BCP coordinator contacts appropriate teams or team leads. For example, if an emergency management team is used, the BCP coordinator will contact the emergency management team lead.

BCP Teams

A BCP can’t be planned, implemented, and executed by a single person. Instead, teams are put together to help in the process.

If the organization is small, it may have a single BCP team that has the responsibilities of all the individual teams mentioned in the following sections. Members have different levels of expertise. Some members will be more active during different phases than others. Larger organizations have multiple teams with different goals and responsibilities.

Although different teams have different goals, members need some common skills and abilities. Most importantly, they need to work together. One member who can work with others and get the job done is better than numerous “experts” who excel at finding fault with others but rarely complete their own tasks.

Three commonly used teams are the emergency management team (EMT), the damage assessment team (DAT), and the technical recovery team (TRT)

. These teams are described as follows:
  • Emergency management team—This team is composed of senior managers. They have overall authority for the recovery of the system but also work closely with the BCP coordinator. At this point, there is a potential for a conflict. Who’s in charge? The BCP coordinator or the EMT lead? To avoid this conflict, the BCP identifies who makes the ultimate decisions. For example, the BCP coordinator may be in charge until the EMT lead shows up, and then authority passes to the EMT lead. Either way, the EMT works closely with the DAT to identify damage. The EMT also works closely with the BCP coordinator to determine the response.
  • Damage assessment team—This team assesses the damage and declares the severity of the incident. The members primarily collect and report data but don’t take action. The exception is if they identify personnel who need assistance.
    Preserving the health and safety of personnel is always a top priority. The team can include IT personnel, facility personnel, and any other personnel overseeing resources. Team members report to the EMT. The BCP may designate specific forms to be used by this team to report their findings. For example, a damage assessment form would allow the members to document the location and severity of damage they discover.
  • Technical recovery team—This team is responsible for recovering the critical IT resources. During the disruption only the IT resources identified in the BIA will be recovered and restored. The members of the TRT will need skills directly related to the resource they are recovering. For example, if team members need to restore a database server, they need knowledge of how to do so.
Key Personnel

TIP

Smaller organizations may use a single team for the BCP, which would include one or more senior management employees to fulfill the role of the executive management team, personnel who could survey and assess the damage of the critical resources, and members who can recover the critical resources.

The BCP may identify additional personnel who have other responsibilities. These personnel would vary from one organization to another. They could include:

  • Critical vendors—If specific supplies or other resources are needed from a critical vendor, the BCP would identify its responsibilities. A critical vendor could be a vendor that is the sole source for a specific part or product that is sold. It could also be a vendor that will deliver emergency supplies within a certain time. For example, a company could contract with a vendor to deliver potable water to the site anytime a hurricane is within 36 hours of striking. Service level agreements (SLAs) may be in place with the vendor to ensure it provided the service when needed.
  • Critical contractors—Many companies have contractors on staff in addition to full-time employees. Contractors can be full-time workers supplementing the staff or part-time workers fulfilling a specific need. If contractors are expected to have specific roles in the BCP, they should be identified. For example, some contractor positions may be mission critical, which would require the workers to work on-site through any type of disruption. The specific responsibilities of these contractors should be identified in the BCP.
  • Telecommuters—Telecommuters often work from home. Working from home is effective as long as the organization is fully operational. However, during a disruption, the telecommuters may not be able to access the organization’s resources so that they cannot accomplish any work. The organization may want these employees to access resources at a different location. Alternately, these workers may have skills that will help the organization get through the disruption, and they may need to report to the work site. The BCP documents what the organization expects of these workers during a disruption.
Order of Succession and Delegation of Authority

In some disasters, the key personnel may not be available. For example, the chief executive officer (CEO) may want to be informed by the BCP coordinator before the BCP is activated. However, what if the CEO is on vacation and can’t be reached? Whom should the BCP coordinator notify instead?

The BCP would include an order of succession, or chain of command, to address these types of situations. An organization could designate the order of succession as follows:

  • CEO
  • Chief information officer (CIO)
  • Vice presidents (VPs) in the following order: service delivery, sales, and marketing
  • Department directors in the following order: service delivery, sales, and marketing

If the CEO or the CIO were on-site, he or she would be contacted first. If the CEO or CIO isn’t there, the VP of service delivery would be contacted. Notice that, if both the VPs of service delivery and sales are there, the order of succession specifies that the VP of service delivery is first.

Similarly, identifying what authority can be delegated may be included in the BCP. Decisions made during a major crisis can affect the organization for years afterward. The BCP may specify that either the CEO or CIO must make some decisions, even if he or she isn’t on-site. If the CEO or CIO isn’t reachable, these decisions can then be delegated based on the order of succession. If the BCP doesn’t delegate authority and personnel on-site cannot reach an executive, they won’t be able to make decisions. The inaction may cause more damage than making a decision that is less than perfect.

Notification and Activation Phase

The BCP coordinator declares the notification and activation phase, which is when the disruption has occurred or is imminent. Comparing hurricanes and earthquakes shows how this phase can differ depending on the disruption.

NOTE

Sometimes, time frames are reported differently by different sources. For example, one TV station says a hurricane will make landfall in 75 hours, whereas another TV station says it will hit in 72 hours. The BCP coordinator is the authority to declare when a specific hurricane stage has been reached.

Weather forecasters are able to give warnings several days in advance for many hurricanes. Although the forecasts aren’t 100 percent accurate, they do provide advance warning for an organization to prepare in case it does hit. The BCP can be written so that different steps are taken at different stages. For example, TABLE 13-1 shows what actions to take at different times when a hurricane is approaching. In the table, the time frames are identified with a specific stage or level code. This stage is internal and indicates what actions to take when that code is reached. This is not a complete list, but it does provide an idea of how different actions are taken at different times.

TABLE 13-1 Hurricane Checklist
TIME FRAME ACTIONS
96 hours
Hurricane stage 4
Inform all personnel that a hurricane can hit within 96 hours. Begin general cleanup outside to ensure that materials that can become projectiles in hurricane-force winds are moved inside. Review steps and responsibilities for other stages.
72 hours
Hurricane stage 3
Review supply list. Ensure that all needed supplies are on hand. For buildings susceptible to flooding, begin sandbagging activities. Review steps and responsibilities for other stages.
48 hours
Hurricane stage 2
Release nonessential personnel to take care of their homes and families. Test backup generators. Notify the hurricane crew that they are on call and when they should report to the site.
24 hours
Hurricane stage 1
Bring in the hurricane crew that will stay throughout the hurricane. Release all other personnel.

In contrast, an earthquake doesn’t give any notification. Instead, it just hits. What’s more, after a major earthquake hits, many aftershocks can be expected.

With this in mind, the BCP for an earthquake will have a much different notification and activation phase. The BCP could even be written so that personnel are required to take specific actions when it hits, without being formally notified. For example, the response team members could immediately report to their lead for direction.

The BCP coordinator will still activate the BCP, which ensures that everyone is notified. However, if an earthquake hits, what happened will be obvious to anyone in the area.

Notification Procedures

Notification procedures can vary from one organization to another. However, the most important step is to ensure that the BCP coordinator is notified of any disruption or disaster covered by the BCP. If a disruption or disaster occurs during working hours, the BCP coordinator will probably be on the scene quickly. If it happens after hours, the BCP coordinator should be tracked down and contacted.

Using some type of phone tree to notify the teams and team members is common. For example, the BCP coordinator could notify the team leads for the EMT, DAT, and TRT. Team leads could then notify all the members of their teams.

TIP

Issuing the BCP coordinator a cell phone is a justifiable expense in this instance. The BCP coordinator should be reachable at any time of day or night to respond to major disruptions or disasters.

Damage Assessment Procedures

The DAT is responsible for assessing the damage and reporting it to the BCP coordinator. The team’s primary goal is to identify the extent of the damage as quickly as possible.

Again, the time when the DAT goes into action is dependent on the disruption. If it’s a hurricane, the members will assess the damage inside the building as the storm hits. For example, they will assess internal flooding and leaks due to storm damage as they occur. When the storm has passed and people can safely go outside, the DAT will assess the damage externally. If the disruption is an immediate disaster, such as an earthquake, the DAT will go into action as the members arrive on the scene.

Data is passed to the EMT team lead and the BCP coordinator. They work together to determine the extent of the damage based on all the reports.

The EMT team lead will then make a determination on what to do. If critical operations can continue to operate on-site, the TRT will begin recovery operations. If damage is extensive and critical operations cannot continue in the same location, operations may need to be moved to an alternate location.

Plan Activation

The BCP coordinator is responsible for activating the BCP but does so based on predefined criteria. In other words, the BCP coordinator doesn’t just make the decision based on a hunch.

For example, the following items are valid reasons to activate the BCP:

  • Safety of personnel
  • Damage to the building affecting CBFs
  • Loss of operations affecting one or more CBFs
  • Specific criteria identified in the BCP, such as a hurricane warning or an earthquake

Specific responsibilities when the plan is activated include:

  • BCP coordinator—The BCP coordinator’s primary responsibility after activating the plan is ensuring everyone is aware that it’s activated, which includes anyone involved in the plan. The BCP coordinator will notify team leads. The coordinator’s responsibility also includes notifying senior management personnel, such as the CEO or CIO.
  • EMT lead—The EMT lead coordinates the actions of the EMT. The team lead also works closely with the DAT lead and the BCP coordinator.
  • EMT—The EMT works with the DAT and the TRT as directed by the EMT lead. Members of this team also interact with personnel outside the organization. For example, a member of this team will talk to the press and ensure the organization presents an image of being “in control” as much as possible. If the organization looks as if it is in chaos, it might lose public trust, which will affect the goodwill of the company for years to come.
  • DAT lead—The DAT lead coordinates the actions of the DAT and works closely with the EMT lead and the BCP coordinator.
  • DAT—The DAT gathers all the information on the disruption or disaster. Its goal is to provide specific details on what is damaged and the extent of the damage. Whenever possible, the team tries to determine whether the site is recoverable and reports its findings to the DAT lead.

TIP

BCPs identify alternate locations, which include hot sites, warm sites, and cold sites.

If the site is not recoverable within a certain period of time, operations may need to move to an alternate location. The BCP coordinator, EMT lead, and DAT lead work together to determine possible recovery solutions based on available data.

Alternate Assessment Procedures

In some instances, the DAT may not be able to assess the damage directly. If necessary, the team can do an indirect assessment based on the available information.

For example, when Hurricane Barry hit the U.S. state of Louisiana in 2019, many organizations had to evacuate, and personnel were not able to return immediately. However, TV images showed the extent of the damage in the area. Executives may not have seen their buildings, but they saw the damage to nearby buildings and knew that they weren’t returning to operations in the original buildings anytime soon.

Personnel Location Control Form

Many organizations use a notification roster. This form identifies the name and contact information of appropriate personnel. It can be used in many different ways, but the primary purpose is to contact personnel when necessary.

For example, consider an organization that is activating a BCP due to an incoming hurricane. Employees can use this form to notify all appropriate personnel. This same form can be used by the BCP coordinator to locate and talk to any of the team leads. Similarly, the team leads can use it to contact personnel on their teams. The format can be as simple as that shown in TABLE 13-2.

TABLE 13-2 Personnel Location Control Form
A sample personal location control form.

Recovery Phase

The step after the notification and activation phase is the recovery phase. This phase is when the TRT members go to work. They have several goals, including:

  • Restoring temporary operations to critical systems
  • Repairing damage done to original systems
  • Recovering damage to original systems

Once the TRT has completed its job, the critical operations will be functioning. The TRT does not focus on recovering and restoring all operations but instead focuses only on the CBFs identified in the BIA.

TRT members commonly use specific DRPs to recover individual systems. For example, the BCP may designate a website and a database server as critical. A DRP could be included as an attachment to the BCP, showing how to recover and restore these services.

Recovery Planning

The success of the recovery phase is based on the recovery planning that was done beforehand. As someone once said, “It wasn’t raining when Noah built the Ark,” meaning it’s too late to plan when the disaster strikes. The plans must be made earlier.

Recovery planning often takes the format of a DRP, which will identify the steps and procedures to restore and recover systems after an incident.

Recovery Goal

The recovery goal is dependent on several factors. The goal could be to recover a portion of the functionality of a CBF. For example, a database may need to be operational so that it can accept some updates and queries. However, it may not need to be able to support the full load of normal operations.

On the other hand, the recovery goal could be much more complete. For example, an organization may have services provided at one location. When a disaster strikes, it may need to restore all functionality at another location.

The TRT will perform the work to achieve the recovery goals. The DRP guides the work, but it is possible that the work will be in phases, depending on the depth of the recovery. This is especially true when operations have to be relocated to a different location.

Technical Recovery Team Lead

The TRT lead oversees the work done by the TRT. This team lead needs to be very familiar with existing DRPs and may even have authored them. The team lead also keeps the EMT lead and BCP coordinator informed of the progress.

Technical Recovery Team

The TRT performs the recovery work. The scope of its work will depend on the extent of the damage and whether operations are moved. For example, a hurricane could have caused water damage in the server room. On-site personnel may have limited the amount of damage by quickly killing the power and moving the servers before the water reached them, which could make the recovery as simple as cleaning up the water damage and moving the servers back and rebooting them.

On the other hand, an earthquake could have destroyed the building and buried the servers beneath the rubble, which makes recovery much more complex. The TRT members must restore the servers at an alternate location, which means they will need to retrieve the off-site backups and ship them to the alternate location, after which they will need to restore and configure the data on them.

The success of the TRT is often dependent on the advance work it has done with the DRP. Ideally, appropriate personnel had tested the DRP and kept it up to date. If not, the TRT will likely have unforeseen problems. Additionally, moving operations from one location to another is a huge project, and problems should be expected. But, if personnel had never tested the DRP, even more problems can be expected.

Reconstitution Phase (Return to Normal Operations)

The last phase is the reconstitution phase, which is when both the critical and noncritical functions are returned to normal. This phase begins when one of two things occurs:

  • The damage at the original location is repaired.
  • Management decides to move operations permanently to an alternate location.

Move Least Critical Functions First

When moving CBFs from an alternate back to an original location, the least critical functions should be moved first, which will help ensure that the most critical functions aren’t interrupted.

When functions are restored at the original location, the process may not go smoothly at first, but, if the least critical functions are moved first, only they will be affected. After the kinks have been worked out of the process, then the more critical functions can be moved.

Original or New-Site Restoration

If damage at the original location is extensive, management may decide to move operations, a decision that will involve many factors. For example, a fire damaged a primary company building. In response, the TRT recovered critical business operations at a regional office. Later, the DAT determined that the fire damage was so extensive that the company needed to rebuild the original building. Now, management must decide where to relocate these operations.

Even though critical operations are at the alternate location, they may not be able to support the noncritical operations. Management could decide to move all operations to a new site and restore them there. On the other hand, the damage could have been only minor, and the critical operations would then need to be moved from the alternate back to the original location. Either way, the TRT will perform the primary work because it is most familiar with the DRPs and the steps that need to be taken to restore the functions.

Concurrent Processing

Concurrent processing means that operations are running at two separate locations at the same time. For example, a disruption has caused operations to be moved to an alternate location, after which the systems will be rebuilt at the primary location. In this situation, many experts recommend operating from both locations for three to five days to be sure that the primary location systems are running smoothly before shutting down the operations in the alternate location.

NOTE

Not all systems will support concurrent processing. Some of them may present technical challenges that prevent running both systems at the same time.

Plan Deactivation

A few things still need to be considered at this stage. First, just returning the original site to operations doesn’t necessarily mean that everything has been normalized. For example, if operations had been moved to an alternate location, that location will need to be cleaned up.

It may have had all the equipment that was needed already in place, but, if it didn’t, multiple servers, routers, and switches might have been shipped there. Now, the goal is to return the alternate location to how it was before the disruption or even better.

Another important consideration is data. Sometimes, after a major disaster, management decides to leave some hardware staged at the alternate location. However, if data is left on these systems, it presents an unnecessary risk because it could be retrieved by anyone who has physical access to the alternate location.

Again, the TRT will be responsible for completing these steps. Including a checklist in the BCP is worthwhile to ensure that nothing critical is overlooked. Once everything has been normalized, the BCP can be deactivated.

Plan Training, Testing, and Exercises

Although creating the BCP is a huge step, its creation is not enough. Steps need to be taken to teach personnel about the plan. The plan also needs to be tested, and exercises demonstrating that it will work need to be performed. The overall goals of these steps are:

  • Training—Teaching people details about the BCP
  • Testing—Verifying that the BCP will work as planned
  • Exercises—Demonstrating how the BCP will work
BCP Training

The BCP coordinator is responsible for ensuring all personnel are trained, of whom the most important are the members of the teams. They should have a good understanding of what their actual responsibilities are when the BCP is activated.

Each of the BCP teams has different responsibilities, but not all the teams need to be trained on all their responsibilities at the same time. Several training sessions can be held as follows:

  • Training session for all teams—This training gives everyone an overall idea of the plan and how each team fits into its success.
  • EMT training—This training is targeted at members of the EMT and identifies their specific responsibilities.
  • DAT training—This training is targeted at members of the DAT. It stresses the importance of the assessment and identifies tools or checklists to use.
  • TRT training—This training is targeted at members of the TRT. It includes reviews of each of the individual disaster recovery plans.

Training should be conducted at least annually unless the BCP or systems change, in which case, training will need to be done more often. For example, if a critical system identified in the BCP is replaced, the BCP needs to be modified. Subject matter experts update the DRP, and, if it is changed, members of the TRT will then need training on the new DRP.

TIP

Because team leads will need to interact with each other, all of them should attend all team training. For example, the DAT lead should attend not only the DAT training but also the EMT and TRT training.

BCP Testing

BCP testing should be completed at least annually. The goal of the testing is to show that the steps within the BCP are achievable, and it provides team members an opportunity to walk through the steps of the plan.

Testing may include the following steps:

  • Testing individual steps within each phase of the BCP—This testing requires a line-by-line review of the BCP. Procedures, such as performing a recall, can be tested only by retrieving the recall roster and calling people on it.
  • Testing all disaster recovery plans—This testing ensures that the steps in the DRP can be completed as written. For example, a DRP may identify steps for rebuilding and recovering a database server. An administrator will follow the DRP on an offline system to determine whether the steps succeed.
  • Locating and testing alternate resources—If the plan identifies alternate locations or resources, they need to be tested. For example, if the plan identifies an alternate location, test the alternate location to see whether it can actually support the CBFs.

Testing should reveal any problems or deficiencies with the plan, which includes any problems with the steps, resources, or personnel, and they should be resolved as soon as possible.

BCP Test Exercises

The primary purpose of BCP exercises is to show how the BCP will work. These exercises should be challenging but realistic and should present problems that are solvable.

In addition to testing the capabilities of the BCP, an exercise will also build participants’ confidence. If an actual emergency occurs, people will be able to think back to the exercise. If everything failed, they won’t feel very confident about the plan during the emergency and may even abandon or try to circumvent it during an emergency.

Many organizations use a phased approach toward exercising a plan. Instead of doing full-scale exercises at first, they perform tabletop and functional exercises.

TIP

BCP exercises should not affect normal mission operations. Any steps that will affect operations can be tested with a simulation. Multiple scenarios can be done as tabletop exercises. For example, one scenario may be a weather-related event, such as a hurricane. The BCP coordinator can identify the stage, and team leads or members can respond by identifying what they would do. Another scenario can be more immediate, such as a fire that occurs in the middle of the night.

Documenting and evaluating the exercise is important and can be done by someone who isn’t a member of any of the BCP teams. Having this outside perspective can be valuable in ensuring that all the issues are addressed.

Tabletop Exercises. A tabletop exercise brings all the members together to talk through the process. In such an exercise, all the team members sit around a conference room table, and the BCP coordinator then presents a scenario to them.

Team members identify what they’d do to respond to the scenario. At this point, the BCP has been written and approved so that ideally the team members’ responses would match what the BCP says. However, in this setting, participants may place themselves in the actual situation and identify different problems.

Functional Exercises. A functional exercise evaluates specific functions within the BCP. For example, the BCP identifies an alternate location for some critical functions. A functional exercise can be performed to restore and recover all the critical resources at the alternate location.

Functional exercises can be less dramatic and resource intensive. For example, simply initiating the recall roster can verify that it is accurate and identify how much time the recall will take to complete.

Just as with a tabletop exercise, documenting the results of the functional exercise is important. The BCP coordinator or someone who isn’t a member of any of the BCP teams can do this.

Full-Scale Exercises. A full-scale exercise is more realistic than either tabletop or functional exercises because it simulates an actual disruption of CBFs. Team members aren’t sitting around a table discussing what they’d do, but instead they take action.

Completing full-scale exercises requires many resources, the primary one being personnel. However, full-scale exercises provide the most realistic view of how team members will respond to an actual emergency.

Just as with other exercises, documenting the results is important. Depending on the breadth of the plan, several outside observers documenting what they see may be beneficial, and gathering input from the team members after the exercise has been completed is important. They will likely have insight into what elements of the BCP worked and how to improve the BCP.

A single report can then be compiled to document this data. At this time, any issues should be addressed, which may require modifying the BCP.

NOTE

The primary difference between testing and a functional exercise is that testing is done without a time frame. Team members can be given advanced notice to perform a test and all the time they need to perform it. On the other hand, a functional exercise is more immediate because team members do not require any advanced notice. The amount of time it takes to complete the functional exercise should be documented.

Plan Maintenance

The BCP coordinator is responsible for the BCP plan, which includes reviews and updates of the BCP. There are several specific reasons to update the BCP, such as

  • Changes to the IT infrastructure
  • Regular updating, such as annually
  • After testing or exercises
BCP Plan Revisions Tracking

All revisions to the BCP need to be documented. This ensures that people can easily tell if the document has been modified and they have the most up-to-date version. Many organizations use a simple version control page. For example, TABLE 13-3 shows an example of a version control page.

TABLE 13-3 BCP Version Control Page
A sample B C P version control page.

In addition to the change being documented in the version control page, all relevant parties must be apprised about the change. For example, if changes directly affect the EMT members, they should know what those changes are.

BCP Updates Based on Changes Within the IT Infrastructure

The BCP should be reviewed when any substantial changes occur within the IT infrastructure, which is especially true after any changes have been made to critical systems.

For example, a single server hosts a web server and a database for online sales. The BCP includes a DRP to recover this server if a disaster occurs. If this system is upgraded to a four-node web farm with back-end database servers in a two-node failover cluster, the change is substantial, but the original BCP and DRP don’t address this new configuration. If these servers are moved to an alternate location, the original BCP and DRP simply won’t provide much help. Therefore, the appropriate thing to do is review the BCP and then upgrade the BCP and DRP to reflect the changes.

Organizations that have change management procedures in place make this review much easier. The BCP coordinator can simply review approved change requests periodically to determine when changes have occurred. Another suggestion is to include a check item in the change management review. For example, the TRT lead may be required to verify that the change won’t affect the BCP or any DRPs. Because many changes are inconsequential, they don’t require a change to the BCP.

BCP Annual Updates and Content Refreshment

The BCP coordinator is responsible for reviewing the BCP at least annually, even if there are no known changes. This review ensures the BCP still addresses and meets all the organization’s requirements. It includes a review of the BIA to ensure that CBFs haven’t been modified and are still considered critical; operational and security requirements; and a review of any of the individual processes, such as recalls, and more technical procedures, such as DRPs.

The review process can be separate from the rewriting process. For example, a member of the TRT can be tasked to review a specific DRP that is included as an attachment of the BCP. The review may identify changes to the system that make the DRP out of date. The TRT member should report the results of the review back to the BCP coordinator, and then the BCP coordinator would have the TRT lead update the DRP.

The BCP coordinator should route changes through appropriate personnel. If a change directly affects the TRT, the changes should be routed through the TRT lead for input. Additionally, all affected personnel should be notified of the change when it has been completed. For example, either the TRT lead or the BCP coordinator should notify all TRT members of the change.

BCP Testing

The review of the BCP should also include reviewing information from training, testing, and exercises. Much valuable information can be learned during each of these activities, such as determining that some of the procedures in the BCP will work well and that others will need to be improved by updating the BCP.

Ideally, the BCP would be updated soon after the report from these events has been completed. However, a review of these reports can also be included in the annual review of the BCP, which ensures that all the issues identified in the training, testing, and exercises are resolved.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.55.193