302 ◾  Official (ISC)
2
® Guide to the ISSMP® CBK®
© 2011 by Taylor & Francis Group, LLC
Many organizations today develop heat maps, as shown in Figure4.11. ese
maps consist of a grid and use color coding to reect the level of risk an activity
poses to the organization.
e number of squares on the grid is indicative and not prescriptive. You should
design the grid according to your own needs. e bold line represents your organiza-
tions RISK frontier. e RISK FRONTIER will ex depending on your organiza-
tions RISK APPETITE, the external threats you are currently facing, and your
business strategy. e squares immediately below and/or to the left of the bold line
should be amber; any risks plotted in these squares require mitigating. ey will of
course be your second priority after you have addressed the mitigations for any risks
plotted above or to the right of the bold linethe RISK frontier.
Step 3: Identify Preventative Measures
A simple formula for estimating the nancial risk associated with a given type of
disaster (and thus how much is worth investing in a plan to mitigate that risk) is
R$ = P × C × T where P is the probability that the disaster will occur; C is the hourly
or daily cost of downtime in lost productivity, lost revenue, etc.; and T is the time
that systems are expected to be down. For example, if the probability of a major hur-
ricane hitting your place of business in the next 3 years is 20% and it will cost you
roughly $100,000 per day that you are down and you expect that you are likely to be
down for a week, then your nancial risk is 0.2 × $100,000 × 7 = $140,000.
Likelihood
3
2
4
5
6
1
5321 4
Impact
6
Security and Continuity Risk Matrix
Risk zone
beyond risk
appetite
boundary
Risk zone just
within risk
appetite
(“Watch List”)
Risk zone well
within risk appetite
boundary
(potentially
over-controlled
risks)
Suggested risk
appetite
boundary
Figure 4.11 Heat map.
Understanding BCP, DRP, and COOP ◾  303
© 2011 by Taylor & Francis Group, LLC
One way to minimize this risk is to reduce T, the time you are down; that is
basically the purpose of the disaster recovery planning exercise. However, it is not
the only way. e risk can also be reduced by reducing the probability that the
disaster will occur or by reducing the cost that will be incurred if it does. Both of
these are types of preventative measures. It is very often the case that the cost of pre-
venting a problem is far lower than the cost of xing it after it occurs.
Measures that reduce the probability of a disaster occurring range from fairly
drastic, like physically moving the organization out of reach of threats such as hur-
ricanes or oods, to the fairly mundane, such as ensuring that regular maintenance
is performed on critical systems; that redundant components are built in; that sen-
sors are installed to monitor environmental factors; that performance monitors are
installed to give early warning of server malfunction; even something as simple as
keeping plastic tarps available to throw over computer equipment to protect it from
water damage.
It is sometimes even possible to reduce the cost of downtime by reducing your
organizations dependence on the system.
e basic idea is to examine the potential win of removing or replacing a system
entirely. What is sometimes forgotten when new equipment and systems are imple-
mented is that the total cost of any system includes not just the up-front cost and
the ongoing maintenance, but also the risk associated with it. ere are times that
it is better to replace a system with one that, while lower in performance, exposes
the organization to signicantly lower risk.
While I dont have any particular procedure to oer, it is potentially very useful
to spend some time in this step for all types of disasters that you wish to protect
against and for all the systems being protected, at both the full-system and com-
ponent levels.
Step 4: Develop Recovery Strategies
e primary task of this step is to determine how you will achieve your disaster
recovery goals for each of the systems and system components that were identied
in Step 2. It is here that you do the core work of balancing costs and benets of the
available approaches, before diving into the complexities of the full plan.
is step is not about selecting specic vendors, determining exact costs, or
developing detailed procedures. Rather, the purpose in this stage is to select the types
of solutions that you will use and to determine the scales of the costs involved. us,
for example, you may determine that a small, critical subset of systems requires a
fully mirrored and staed alternate site ready to take over in minutes, while other
systems can utilize a more traditional backup strategy that trades longer recovery
time for much reduced expense.
ere are several considerations to keep in mind as you work through this step.
First, this is the point at which it becomes important to consider exactly what types
304 ◾  Official (ISC)
2
® Guide to the ISSMP® CBK®
© 2011 by Taylor & Francis Group, LLC
of disasters you need to prepare for and to classify them by the extent and type of
impact they have. e reason is straightforward—the recovery strategies available
to you necessarily depend on from what you must recover. Focused failures such as
a hardware component failure or a water leak aecting a couple of servers are very
dierent matters than a site-wide disaster like a ood, re, or regional blackout. In
one, it may be possible to depend on a vendor to deliver a replacement. In the other,
an alternate facility may be required.
e second consideration that arises from this is the need to consider solutions
of diering breadth of coverage. Obviously, a solution that can address a site failure
will serve as well to recover from failures of individual systems or components.
ere are a number of reasons not to depend on a single system-wide solution to
address all issues, however. e most obvious is that such a solution is certainly
costly to implement and probably costly to activate—the disruption from failing-
over all of your systems to a secondary site is unlikely to be commensurate with a
problem arising from failure of a single component.
ere are two other reasons to not just consider but actually to implement sev-
eral alternative approaches that address dierent levels of problems. First, no matter
how good your planning and testing are, depending on a single solution, especially
a complex one, means that your recovery is all or nothing. Anyone with much
experience in complex systems knows that this is not a good idea. It is much better
to have a series of backup solutions so that if one fails another is in place to recover
at a greater, but still not completely devastating, cost. Second, alternative solutions
that are actually implemented can give you signicant exibility in responding to
an actual disaster. For example, if a problem occurs in the middle of a business day,
it may be important to go immediately for an expensive solution in order to recover
quickly, but it if happens at night or over a weekend, you may be able to institute
slower, but less disruptive or less costly recovery procedures.
A nal consideration is that you need to take into account the particular char-
acteristics of the infrastructure, human, and data aspects of recovery. Each of these
must be considered dierently, with dierent fundamental drivers of the decision
of what type of solution to invest in and how much to spend.
Infrastructure is the simplest. While there may be more or less signicant costs
involved, the salient characteristic of infrastructure is that it can be replaced. A server
can be replaced with another server, an alternate provider can be found for network
connectivity, and so on. In many cases, the replacement system need not even be an
exact duplicate, as long as it interfaces with other components and systems in a compat-
ible way: manual systems may replace IT-based automated systems, or you may be able
to temporarily outsource an internal system. e fundamental driver when consider-
ing recovery strategies for infrastructure is typically that of cost versus performance.
People (roles) represent a more dicult factor. Particular roles usually require
special skills and knowledge. If a recovery strategy requires, for example, that a
Understanding BCP, DRP, and COOP ◾  305
© 2011 by Taylor & Francis Group, LLC
given role be duplicated at an alternate site, there may be additional costs associated
with hiring or training personnel to ll that duplicate role. Similarly, the recovery
solution itself may require special training or skills. e key driver for the consid-
eration of recovery strategies from the standpoint of roles, then, is the degree to
which they require you to duplicate or acquire special skills, which then impacts
the long-term cost of hiring and training.
Finally, data are potentially the most dicult issue because data are usually
unique—it is not generally possible to replace data with other data that have the
same or similar properties. Either you have it or you dont. e driving question
from the standpoint of data, then, is how much data are you willing to lose? It
is also important here to recall that data may not only be destroyed, for exam-
ple, through the loss of the system it is stored on, but they may also be corrupted
through user or administrator errors or through deliberate sabotage, for example,
through a virus attack.
Keeping all these considerations in mind, this step consists of reviewing each of
the systems characterized in Step 2 and determining the system- and component-
level strategies to apply that can achieve recovery within the maximum outage
time, while remaining roughly in the bounds of budget and other resource con-
straints that have been established. In considering cost, it is always important to
characterize the total cost of a solution, which means not only the cost of any hard-
ware, software, or services purchased, but also the level of disruption caused by
installation and the ongoing costs in both money and personnel to maintain and
test the solution.
Keep in mind too that the dierent systems are not independent of one another
and in many cases require that the recovery strategies be compatible. is may
be a minor consideration for purely local solutions, but if you wish to set up an
alternate site, you need to ensure that all the systems on which a given system
depends are also duplicated. For example, if you wish to have a failover system for
your email servers at an alternate site on the other side of the country, in addition
to the email server you will need facilities to house the server, power, network
connectivity, sucient bandwidth to operate throughout a crisis, access or per-
sonnel to maintain the facilities and the servers, DNS servers and other auxiliary
systems that are critical to the operation, etc. When you are done, add a brief
summary of the recovery strategies for each of the systems on the Master System
Information Form.
Step 5: Develop the Plan
is step is the culmination of all your work. It is not, unfortunately, an easy step,
but neither is it too complicated, as long as you have been thorough in the previous
steps and you approach it systematically.
306 ◾  Ofcial (ISC)
2
® Guide to the ISSMP® CBK®
© 2011 by Taylor & Francis Group, LLC
e outcome of this step is both a documented plan and the completed imple-
mentation of all the infrastructure required to enable the plan. e documentation
includes background information on the assumptions and constraints that went
into making the plan, as well as written documentation on specic procedures.
e implementation side includes purchasing and installing hardware and soft-
ware, setting up alternative locations, contracting for alternative sources of network
or other communication services, and so on.
is step is a major project all by itself, even if the previous steps have been
carried out perfectly. It will require a signicant amount of time on the part of the
person or team responsible for leading the development of the plan, but it will also
require time and eort by everyone whose systems are involved, since their expertise
will be required both to develop recovery procedures and, of course, to test them.
We will cover all aspects of the plan, roughly following the organization sug-
gested in the NIST guide. e organization itself is not important—that should
be adapted to best serve your needs—but all the types of information I will discuss
should be present in the plan. e sections we will use are as follows:
1. Introduction
2. Operational Overview
3. Notication/Activation Phase
4. Recovery Phase
5. Reconstitution Phase
6. Appendices
For each section, I will briey discuss its contents and the purpose for their
inclusion, then review some of the major issues or potential pitfalls to keep in mind,
and nally oer some suggestions on how to approach the actual development of
that section.
Keep in mind that this part of the work in particular is likely to be iterative.
As you select specic solutions and work out step-by-step procedures, you may
discover new dependencies, errors in your initial assumptions, or simply that your
planned approach exceeds the resources available, so that you need to revisit and
re-evaluate your recovery strategies and perhaps even your recovery targets. In fact,
you may wish to do this deliberately in preparing the entire plan—to make several
passes through all of the steps in this guide, but starting with a much shallower
eort and deepening gradually. is approach can signicantly reduce the likeli-
hood of surprises that entail rethinking major parts of your plan.
Plan Section 1: Introduction
e main purpose of this section is to document the goals and scope of the plan, along
with any requirements that must be taken into account whenever the plan is updated.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.26.152