No one can argue the importance and significance of technology within organizations. When we say technology, we don’t only mean IT. We refer to all automation and processing functions within an organization. IT may form the biggest part of technology yet it is not the whole.
The role of technology within organizations has been continuously evolving from being a support function to the rest of the organization, to being an enabler to the business, to being a revenue-generating line of business by itself. Dependence on technology to achieve operational and strategic goals and targets in modern organizations has reached unprecedented levels and will continue to do so in the future.
As a result of this key dependence, failures in technology are critically sensitive for an organization in terms of probabilities and impacts. Dependence on technology sometimes reaches the state of total dependence. Almost all modern organizations have their most valuable assets residing on technological, or automated, platforms: data, information, and processes.
On the other hand, technology services are generally shared across an organization with common technology components serving several parties. Having such components fail would actually affect all parties utilizing them.
Technology is also, as with any product made by people, inherently susceptible to errors and bugs. Testing and quality assurance dramatically decrease such errors and bugs, but do not completely eliminate them.
Technology is also one of the main areas for outsourcing. Failures and problems at the outsourcing provider can easily propagate to the outsourcing organization. This means that the scope of the probability and impact of threats sometimes gets much larger with outsourcing.
Technology is also a specialized area where it is somehow more difficult to maintain a sufficient level of specialty and expertise all the time. The high staff turnover in technology is a key threat to consider in this sensitive area.
Because of the unique nature and role of technology within an organization, specific types of planning and management need to be provided to address and manage the specific requirements of technology within the BCM program. As we will be focusing on IT as a representative element of technology for the rest of the book, we will refer to such specific types of planning and management within IT disaster recover (ITDR) and readiness for business continuity (IRBC).
ITDR/IRBC processes have been around for a long time, even before the formulation of BCM and other related disciplines. In fact, many of the ITDR/IRBC practices have evolved to become of those of BCM.
ITDR focuses on setting up the necessary elements: framework, processes, arrangements, and resources, to be in a state of readiness to manage, recover, and resume critical technology services in cases of disaster and disruption. As the technology role within organizations has become more and more critical, recovery and resumption of relevant services are not enough. What is now needed is continuity instead of recovery. To satisfy this requirement, IRBC has been formalized. IRBC is concerned with having the management framework and all the resources prepared and ready to ensure the continuity and availability of technology services according to BCM requirements. It aims to support and enhance the recovery levels of the organization.
While both share the same focus area, IRBC and ITDR have their differences. ITDR focuses on recovery while IRBC focuses on continuity. Because ITDR is oriented towards recovery, it mostly addresses components in isolation while IRBC looks more at end-to-end technology services. IRBC is closer to BCM as products, services, and business processes can be mapped and aligned to technology components: logical or physical. IRBC is also more capable of achieving better resilience levels against smaller outages and incidents, not only disasters and major outages.
The IRBC/ITDR program is a subset of the organization-wide BCM program. IRBC/ITDR should follow BCM guidelines and should support the BCM program by meeting the continuity requirements specified in the various stages and phases of the BCM life cycle. The IRBC-focused Standard, ISO27301, proposes a life cycle for the IRBC program that is aligned to the BCM life cycle. Being aligned here indicates support, not repetition or redundancy.
It implies integration and compatibility with the BCM life cycle.
The relationship starts as early as the BCM program setup phase with the inclusion of the IRBC/ITDR program within the BCM program. It is highly risky to run two separate, disconnected programs. It will, most probably, lead to contradicting and ineffective recovery efforts as well as a waste of resources that may be under or over-allocated according to the real needs of the organization. Technology is there to help the organization achieve its goals and targets, including continuity objectives. This is valid both in normal conditions as well as in disasters.
In the BIA phase, the technology requirements are collected, reviewed, and validated along with the other information. In this regard, two main key actions are recommended:
In the threat and risk assessment phase, threats and risks to technology resources and components are assessed. Specifically, the assessment should cover threats and risks related to the confidentiality, integrity, availability, and currency of the technology components in accordance with the continuity specifications produced from the BIA.
Within the strategy phase, strategies set for IRBC/ITDR programs should be in line with the BCM strategy. There are no hard rules here as IRBC/ITDR strategies will cover specific areas of technology while BCM strategies cover all areas within the scope.
When it comes to planning and implementation, IRBC/ITDR plans should be consistent with the BCM plans, giving extra care to the CMP as technology incidents are among the top common causes of disasters and crises.
The technology recovery plans should support the other plans dependent on them and be in accordance with the continuity specifications and strategies set and approved by the organization.
Testing the IRBC/ITDR plans can be done separately or in conjunction with the BCPs. The main rule here is to be practical and flexible.
Training given regarding the IRBC/ITDR program is in some ways distinct from the BCM training as it includes technical information that requires specialization and expertise not necessarily available in areas outside IT and technology-related departments. Awareness, on the other hand, can be taught in similar fashion to BCM awareness as it is more beneficial to introduce common programs in an integrated manner rather than delivering separate paths of awareness for BCM and IRBC/ITDR.
Review and update processes are common to both technology and non-technology areas. In fact, technology is the area that witnesses more changes across shorter periods of time due to the fact that technology evolves at a fast pace and the services offered are getting more and more complex and advanced.
In order to be implemented properly and in accordance with the BCM program, the IRBC/ITDR sub-program needs a clear and effective governance model. The elements of such a governance structure should be integrated with the BCM governance model:
The IRBC/ITDR sub-program needs executive support and ownership as well as accountability for its effectiveness and success. The IRBC/ITDR owner works very closely with the BCM owner, bearing in mind that the BCM owner is the ultimate owner of the BCM program and its subprograms, like the IRBC/ITDR program.
The scope of the IRBC/ITDR activities should cover the continuity specifications and requirements set out and agreed through BCM’s BIA process. If there are special requirements that were not extracted from the BIA process, the IRBC/ITDR scope may include them provided that the IRBC/ITDR owner agrees and approves such inclusion.
The IRBC/ITDR policy includes the IRBC/ITDR scope and the strategies and guidelines adopted in order to meet the specifications and requirements.
The IRBC/ITDR policy should be approved by the IRBC owner and BCM owner. It should also be documented and controlled as well as communicated to relevant stakeholders. The policy should be reviewed on a periodical
basis and as needed to make sure it is current and relevant to the status of the organization and its technology aspects.
IRBC/ITDR professionals should be competent in both technology and other areas as they are situated on the borderline between technology and other areas. They share almost the same competencies required by the BCM team, but with more focus on the technology side.
The main purpose of the IRBC/ITDR life cycle is to establish and plan for the technology aspects that support an organization’s BCM program and its implementation.
The IRBC/ITDR life cycle, as mentioned earlier, comprises the following phases:
This phase starts where the BCM’s BIA process ends. The BIA results should be translated for technology services and consequently its components, as mentioned earlier, and in particular for the critical ones. If a business process or activity is approved as being critical, then the technology services supporting such a process or activity are categorized as critical. There are other points to consider here:
Technology services should be assigned continuity specifications. The main specifications are criticality, RTO, RPO, and dependency. Criticality, RTO, and dependency specifications play a major part in defining what technology services will be offered during disasters, where they will be offered from, and the order of their recovery and restoration. RPO is a major decision maker in the protection of data, off-site data backup, and replication.
Defining criticality, RTO, RPO, and dependency is not always an easy task, especially when technology is complex. A major factor in deciding on these specifications is the technology architecture and deployment. It is very likely that there will be cases where continuity specifications, especially RTO and RPO, cannot be satisfied due to technical constraints or the cost of implementation. In this case, a compromise needs to be reached for the derived specifications, with the endorsement of relevant stakeholders.
Once continuity specifications have been defined for technology services and their components, a gap assessment should be done of what is already in place and what is missing. Going to such an effort will help greatly in designing the IRBC/ITDR strategies and options.
The results should be reported to the IRBC/ITDR owner, the BCM manager, and the BCM owner and committee respectively for review and approval.
Once continuity specifications are defined and approved for technology components, a comprehensive risk assessment need to be done for these components. Very similar to the goal of threat and risk assessment in the BCM life cycle, the goal of this risk assessment exercise is to limit the probability and impact of risks and threats occurring and, consequently, the occurrence of disasters and crises.
The technology risk assessment needs to look at all of the technology area’s components, not just hardware, software, and network aspects. In this context, the technology risk assessment should look at the following:
The threats should be defined in terms of probability and impact. The definition can be qualitative or quantitative. Regardless of the approach used, it should be meaningful and consistent with the approach of the BCM life cycle. Such consistency is vital in order to properly manage threats and risks across the organization in a way that gives the right weights and priorities. Using different approaches may lead to wrong prioritization of threats and risks, which leads to an ineffective reduction of the overall risk exposure of the organization. Threats and risks are then rated by multiplying probability and impact ratings.
Once threats and risks are assessed, the next step is to propose management actions for them. The management actions follow the same concepts provided within the BCM life cycle. Once management actions are defined, the risk assessment results should be reported to the IRBC/ITDR owner and BCM manager as a first step before being raised to the BCM owner and committee for review and approval.
This stage of the IRBC/ITDR life cycle focuses on creating strategies that satisfy the requirements of the continuity specifications as well as creating risk treatment and reduction plans.
The strategies are categorized by the components of technology.
For specific components, the strategies and RTPs will most probably share all or part of the continuity strategies and risk treatment plans that were laid out and approved through the BCM life cycle as they are going to be used across the organization, including technology.
When designing IRBC/ITDR strategy or RTPs, there are factors that play a part in deciding on the different options proposed:
Considering the factors above and the BCM continuity strategy options, the range of IRBC/ITDR strategic options to cover all of the technology components may include the following:
After the options are proposed, they should be reviewed by the IRBC/ITDR owner and BCM manager before being raised to the BCM owner and committee for final review and approval.
In a similar manner to the sequence of BCM life cycle, once the strategies for IRBC/ITDR components are adopted, the actual IRBC/ITDR plans need to be developed and implemented.
Within this phase, the plans developed and implemented do not concentrate exclusively on the management of and recovery from major incidents or disasters. There should also be some arrangements and planning to manage smaller incidents. The justification for this is that managing and resolving these incidents as early as possible can prevent them evolving into major incidents or disasters. Having smaller incidents management within the scope of the plans can create an integrated escalation process if smaller incidents do become major disasters.
The IRBC/ITDR plans follow a similar structure to the BCM plans. The structure of IRBC/ITDR plans is separated into two layers:
The IRBC/ITDR management plan provides the groundwork and basis for technology service recovery and continuity. The IRBC/ITDR management plan should enable the technology-related functions to:
It is apparent that these objectives and purposes are very similar to the ones of the BCM’s CMP. Failure to achieve such purposes may actually undermine the readiness and recoverability state of an organization as it will be heavily dependent on technology to perform its operations, even in times of disaster.
The contents of the IRBC/ITDR management plan are also similar to the BCM CMP. A typical IRBC/ITDR management plan should contain:
The other types of IRBC/ITDR plan, the IRBC/ITDR recovery plans and procedures, are more specific and focused on the goal of recovering a single group of technology services or components as required by the continuity requirements. Thus the typical contents of IRBC/ITDR recovery plans and procedures are:
The developed plan must be reviewed by the relevant technical managers to ensure its technical validity. After that it should be reviewed by the IRBC/ITDR owner and the BCM manager before being delivered to the BCM owner and committee for review and approval.
The developed plans need to be tested to make sure that the strategies and their implementations can satisfy the relevant requirements. In addition to this, tests and exercises build confidence in the IRBC/ITDR plans and arrangements. Testing is also a good training and awareness tool that can be used to familiarize the staff and vendors with the IRBC/ITDR plans and arrangements. Conducting tests and exercises is also an audit and regulatory requirement.
Testing and exercising IRBC/ITDR plans follow the logical path of maturity and growth. There’s almost no use in conducting full IRBC/ITDR tests that will certainly fail and prove almost nothing. It is recommended, and preferred, that the testing phase is gradual and ongoing in order to achieve the desired goals and purposes.
There are several types of IRBC/ITDR testing, from simple and easy to complex and difficult:
The IRBC/ITDR testing process is similar to the one implemented in BCM. The test plan, dates, and scope should be reviewed by the IRBC/ITDR owner and BCM manager before being reported to the BCM owner and committee for review and approval. Test results, issues, recommendations, and modifications should be passed to the IRBC/ITDR owner and BCM manager. If required, reporting can be escalated to the BCM owner and committee for resolution.
With the continuous and ongoing changes and developments in organizations and technology, the IRBC/ITDR components need to be up to date and fit for the new developments in technology services and components. For technology in particular, the changes and updates are some of the fastest within the organization. This will actually increase the importance and burden for the IRBC/ITDR team to keep up with the changes and maintain their IRBC/ITDR sub-program.
Being part of the greater BCM program, the IRBC/ITDR sub-program should follow the guidelines for maintaining and updating its components, plans, and arrangements. As a main guideline, the IRBC program needs to be revisited and reviewed at least annually. In fast-changing organizations, the frequency can be increased to more than once a year. As for the change triggers, they may include:
Regardless of the trigger, the change process needs to be controlled through a layered review and approval process. Any changes or updates need to be qualified by the IRBC/ITDR manager before being forwarded to the IRBC/ITDR owner and BCM manager for review and approval. If the change is of a critical nature, the change and update need to be forwarded to the BCM owner and committee for review and approval.
Technology services are delivered through an integrated set of components, as discussed earlier, which are set up, operated, and maintained within specially prepared environments. Thus, they need to be located within locations, or sites, that can control their surroundings and environments.
Owing to the unique and specific nature of sites hosting technology components, they need to be carefully planned and, if possible, distributed in a manner that provides the maximum protection and control for the hosted components. Technology hosting sites are commonly called data centers. However, this may sometimes be misleading. Among technology components, only hardware, software, and networking are hosted within data centers, while the other components are normally located outside. Technology sites include data centers and other locations that host the majority, or all, of the technology components within an organization.
Although they host unique components, technology sites are, in the end, just locations. They are exposed to threats similar to those of other locations or buildings. Among these threats are:
The above threats are only examples. Unfortunately, the list keeps growing. Realization of any of the threats can cause serious incidents with severe impacts affecting not only technology-related departments, but the whole organization.
The best way to protect technology sites is to avoid as many of the threats as possible. Unfortunately, this is not easily achieved. There are serious threats that are simply out of our control, especially when thinking about natural disasters.
There are, however, workarounds to such situations. These workarounds do minimize the probability and impact of many of the threats, bearing in mind that zero threats is not practically achievable. Threat minimizing looks at every threat and creates a solution for it. For natural disasters, technology sites should be located, if possible, in areas that are environmentally stable and are a sufficiently safe distance away from natural threat sources. Special site construction and treatments are needed, too. For example, the site should be resistant to earthquakes, properly insulated, have effective fire-fighting, detection, and alarm systems, and be easily evacuated to safe areas.
As for man-made threats, the site should be well away from dangerous or hazardous locations that can render the site damaged, unusable, or inaccessible. It should possess a number of different access routes and paths. It should be equipped with backup power arrangements and sufficient amounts of water to serve the site. A technology site should be protected with sufficient physical security measures and access to it should be controlled properly with a strong access control process. In addition, the site should have multiple telecommunications services and routes provided through different vendors, if possible. For further protection against disruption, the multiple telecommunications services can be routed through different media and network topologies.
The above arrangements handle only one side of the problem; the other part is the most difficult to manage. How can an organization provide the required technology services with minimal disruption, outages, or losses? What if, after taking all precautions, technology sites fail and their components fail as well? It could happen if we cannot eliminate risks and threats completely.
The logical answer is to have more than one technology site. Having this solution can satisfy a lot of technology requirements that are not necessarily exclusive to technology continuity.
The idea is to have two or more technology sites, not just one, that are operational or ready for operations. If a problem happens to one of them, the others can continue to provide the required technology services as planned.
In terms of threats, this may not reduce the probability much for a specific site but it works very effectively with reducing the impact on technology services and the organization as a whole to a minimal level that is acceptable and tolerable. This achievement represents the core concept of technology continuity sites.
The number of technology continuity sites that an organization should establish and deploy relates directly to the risk appetite of the organization, which is linked to the types and nature of the threats and risks facing the organization. Deploying more than one technology site is relatively expensive and technically challenging.
There are organizations that deploy two continuity sites. The first site contains the critical technology components that are most important and have the shortest RTO and RPO ratings. It should be possible to activate this site quickly. These sites are often called high availability (HA) sites. Usually, on these sites, technology components operate by load balancing, load sharing, or running certain critical functions away from the primary technology site. Switching between the HA and primary sites is usually easy, and in some cases is done automatically without manual intervention.
The second site contains all the critical technology components and other components chosen by the organization. This site would be activated if the primary and HA sites failed to operate. In many cases, the site is called a recovery site and is almost identical to the primary technology site.
The main purpose of having one or more technology continuity sites is to continue offering the technology services as required by the organization. This purpose can be achieved through different strategies. The main two of these are:
These two strategies are the most dominant, with the latter being adopted more nowadays. Having internal deployment of technology continuity sites gives the organization complete control over the site location, contents, capacity, use and invocation, and upgrade, if required. This does not come cheap as there is a high price tag in getting a complete technology continuity site established, ready, and maintained. It also requires the organization to have specialized and available expertise and resources to establish, deploy, and manage the sites.
Outsourcing technology sites to specialized vendors limits the organization’s control over the site in terms of capacity, contents, invocation, and even testing. Yet the remaining control level is satisfactory to many organizations. Usually the vendors own and deploy several locations: across different geographies, with different capacities, or with special arrangements. So the outsourcing organization has a wider scope of choices to select from. In difficult economic conditions, such a strategy can be very attractive to organizations of all sizes.
Some organizations follow a mixed approach where they deploy their HA sites internally and outsource their recovery sites to specialized vendors. Again, this decision is based on many factors like budgets allocated, risk appetites, threat and risk nature, and availability of specialized vendors.
After deciding the number of technology continuity sites needed and how they will deployed, the time comes for selecting the locations of these sites. The sites should adhere to the same measures mentioned above for the primary technology site, with the addition of one main feature; they should have a different exposure to threats and risks, with less impact. In other words, technology continuity sites should face different types of threat and risk and their impact, if these threats and risks materialize, should be less than for the primary technology site. How much less is unique to the site and goes back to the risk appetite of the organization.
In general, the main features of technology continuity sites are that they:
The above represent the main features of technology continuity sites. However, it is up to the organization to decide how far it should apply these features on their technology continuity site.
For specific site specifications, there is a common reference for the design and operation of technology sites, whether they are continuity or primary sites. The reference comes from the Uptime Institute (www.uptimeinstitute.org)8 and classifies a technology site into four main levels, or tiers, depending on their failover capability:
It can be seen that progressing from tier I to tier IV indicates more resilience and failover capability are invested in the technology site, with a relatively higher cost. This reference is very useful in identifying specifications and possible suitability of sites to accommodate technology components and services.
When an organization chooses to outsource it technology continuity site, or sites, externally to specialized vendors, there are some points to be considered. Usually, the vendors or providers have certain limits for the type of hardware, software, communications, invocation, testing, upgrades, and logistics related to staff, operators, contractors, and materials and supplies. It is highly recommended that you review the terms and contracts provided by the vendors covering these aspects. The vendor is not entirely dedicated to your organization as they have other clients who might use the site. If disasters affect other organizations, the site provider becomes more sensitive to having additional requirements. Thus all plans relevant to the outsourced sites should be made according to a worst-case scenario.
Other points to consider are the arrangements, tier classifications, and certifications for the sites managed and operated by the vendor. They can give you a satisfactory level of assurance that the sites will be there when you need them. It is recommended also to have analysts’ and consultants’ information, background, and history for the vendors and their effectiveness and commitment to what they offer.
In conclusion, there is, as usual, no simple answer to the question of having technology continuity sites incorporated within the IRBC/ITDR program. This area is continuously evolving in an aggressive way. There are new solutions being devised and introduced. Some of them, like Cloud Computing and advances in the convergence of mobile computing and telecommunications, can even change the way we approach the issue of technology continuity sites in a radical manner. Consider the technology outlook before choosing the strategy, location, capacity, and components related to technology continuity sites.
Traditionally, across many organizations, technology was always paired with the IT department. Under this convention, technology and information technology were used interchangeably. While that might be valid in a lot of organizations, there are exceptions to this rule whereby technology does not necessarily mean common information technology. These exceptions are continuously growing in the modern economy where organizations are finding smarter uses for technology with the convergence of information technology, communications, and data and information management.
There is a specific type of organization where technology is the main service and/or product. Within these, the business/support classification of departments becomes reversed. An example of this type of organization is in the telecommunications industry where nearly all functions are there to support the telecommunication services and products offered to customers. Technology is then no longer an enabler, as without technology the organization not only fails, but loses its reason to exist.
The reason we need to discuss this issue is to make sure that continuity planning for technology in organizations is not only about IT but extends to reach all critical functions needed by an organization to fulfill its goals. While the same continuity principles apply, implementations can vary significantly due to the specific nature of the technology landscape within individual organizations. Extending the technology program to become a comprehensive one across the organization will actually cater for all of the dependencies between technology components, both inside and outside the IT department.
Comprehensive technology continuity programs can provide the organization with a platform for consolidation, saving expensive resources across multiple technology components. Through the processes of requirements gathering and building continuity strategies, professionals are more likely to highlight areas of redundancy and ineffective utilization of resources. Highlighting such findings to top management can trigger actions to increase consolidation, integration, and enhancements for technology services. These results are not only on a technical level but also on governance and people levels, among others.
Comprehensive technology continuity programs are built on the same basis as those of technology continuity. The main idea is bring the technology owners into the governance model through the assignment of a technology continuity owner, rather than the IRBC/ITDR owner. The IRBC/ITDR continuity policy has to be upgraded to be a comprehensive technology continuity policy. Having senior technology representatives on the BCM committee is also recommended.
As a result, the technology continuity managers, who are responsible for daily activities within the technology continuity program, have to be reconsidered. There are two suggestions for handling this:
Deciding on this issue depends on the unique nature of the organization. Organizations which define and enforce clear policies, roles, and responsibilities may find it attractive to have a single manager. Other organizations, which have difficulty in communicating and coordinating across departments or have an immature technology continuity program, can choose to assign specific continuity technology activities to different stakeholders with more top-management oversight and control.
The technology continuity life cycle follows the same principle as the IRBC/ITDR life cycle. The main difference is that the number of components increases and the concepts get wider to include the technology components deployed to support the technology services for the organization.
8 Uptime Institute, LLC. Data Center Infrastructure Tier Standard: Operational Sustainability. (2010)
18.116.69.53