Chapter 8

Managing Activity Risks

“Statistics are no substitute for judgment.”
—HENRY CLAY, U.S. SENATOR

Risk assessment provides a prioritized risk register. When you use this list, it becomes clear just how much trouble your project is in. An accumulation of significant scope risks may indicate that your project is literally impossible. Too many schedule or resource risks may indicate that your project is unlikely to complete within its constraints. Project risk management can be a potent tool for transforming a seemingly doomed project into a merely challenging one.

Managing risk begins with your prioritized risks register. Based on your sorted list, you can set the boundary between the most significant and least significant risks. Risk response planning uses this boundary as a guide; all the risks above the cut-line will deserve at least some attention. In addition, though, a prudent project leader reviews the whole list, at least briefly. The most important reason for this is to reconsider all risks that have significant consequences. When the potential impact for a risk exceeds acceptable limits, a response may be in order even if the probability is estimated to be low. There may also be low-rated risks for which there are simple, cheap responses. It makes little sense to ignore risks for which there are trivial cures.

For each risk you deem significant, you can then seek root causes to determine your best management strategy. For risks where the project team has influence over the root cause, you can develop and analyze ideas to reduce or eliminate the risk, and then modify the project plans to incorporate these ideas wherever it is feasible. For risks that cannot be avoided or that remain significant, you can also develop contingency plans for recovery should the risk occur.

Root-Cause Analysis

What, if anything, can be done about a risk depends a great deal on its causes. For each identified risk that is assessed as significant, you must determine the source and type of risk that it represents.

The process for cause-and-effect analysis is not a difficult one. For risk analysis, it begins with the listed risks and their descriptions. The next step is to brainstorm possible sources for the risk. Any brainstorming process will be effective as long as it is successful in determining conditions or events that may lead to the risk. You can begin with major cause categories (such as scope, schedule, and resource) or simply think about specific factors that may lead to the risk. However you begin the analysis, complete it by organizing the information into categories of root cause. Some redundancy between items listed in the categories is common.

Cause-and-effect analysis using fishbone diagrams, so called because of their appearance, was popularized by the Japanese quality movement guru Dr. Kaoru Ishikawa. (They are also sometimes called Ishikawa diagrams.) These diagrams may be used to display root causes of risk visually, allowing deeper understanding of the source and likelihood of potential problems. Organize the possible causes into a branching diagram similar to the one in Figure 8-1. Note that some causes may themselves have multiple potential sources. Continue the root-cause analysis process for each significant risk in the project.

Categories of Risk

In dealing with risk, there are really only two options. In an advertisement some years ago, the options were demonstrated pictorially using an egg. On the left side of the picture was an egg falling toward a pillow held in a person’s hand. On the right side was a broken egg oozing over the flat, hard surface it had smashed into, with a second hand swooping in holding a paper towel. The left side was titled “Prevention” and the right side “Recovery.” Management of risk in projects always involves these tactics—prevention to deal with causes, and recovery to deal with effects.

The three categories of project risk are controllable known risks, uncontrollable known risks, and unknown risks. All the significant listed project risks are known risks and are either under your control or not. For listed risks it is possible to plan for response, at least in theory. The third category, unknown risks, is hidden, so specific planning is not generally possible. The best method for managing unknown risk involves setting project reserves, in schedule or budget (or both), based on the measured consequences of unanticipated problems on similar past projects. Keeping track of specific past problems also converts your past unknown risks into known risks. Managing unknown project risk is addressed in Chapter 10.

Figure 8-1. Fishbone diagram example.

Image

Root-cause analysis not only makes known project risks more understandable, it also shows you how to manage each risk. Based on the root cause or causes, you can determine whether the risk arises from factors you can control, and may therefore be preventable, or whether it is because of uncontrollable causes. When the causes are out of your control, risk can only be managed through recovery. These strategies are summarized in Figure 8-2.

Known controllable risks are at least partially under the control of the project team. Risks such as the use of a new technology, small increases in complexity or performance of a deliverable, or pressure to establish aggressive deadlines are examples of this. Working from an understanding of the root causes for these problems, you may be able to modify project plans to avoid or minimize the risk.

Figure 8-2. Risk management strategies.

Image

For known uncontrollable risks, the project team has essentially no influence on the source of the risk. Loss of key project staff members, business reorganizations, and external project factors such as weather are examples. For these problems, the best tactic is to deal with effects after the risk occurs, recovering with a contingency plan you prepared in advance.

It is common for a root-cause analysis to uncover some causes that you can control as well as some that you cannot for the same risk. Responding to risks with several possible sources may require both re-planning and preparation for recovery.

Although the dichotomy between controllable and uncontrollable may seem simple, it often is not. The perceived root causes of a risk vary depending on the description of the risk. To take the example of the fishbone diagram in Figure 8-1, many of the root causes seem out of the control of the project team, as the risk is described as the loss of a particular person. If the exposure were redefined to be the loss of a particular skill set, which is probably more accurate, then the root causes would shift to ones that the project might influence through cross-training, negotiating for additional staff, or other actions.

Even when a risk seems to be uncontrollable, the venerable idea from quality analysis of “Ask why five times” may open up the perspective on the risk and reveal additional options for response. If weather, earthquakes, or other natural disasters are listed as risks to particular activities, probe deeper into the situation to ask why and how that particular problem would affect the project. The risk may be a consequence of a project assumption or a choice made in planning that could be changed, resulting in a better, less problematic project. Shifting the time, venue, infrastructure, or other parameters of risky activities may remove uncontrollable risks from your project, or at least diminish their potential for harm.

Risk Response Planning

Two basic options are available for risk management: dealing with causes and dealing with effects. There are, however, variations on both of these themes.

Dealing with the causes of project threats involves risk prevention—eliminating the risk (avoidance), lowering its probability or potential impact (mitigation), or making it someone else’s problem (transfer). Avoidance of risks requires changing the project plan or approach to remove the root cause of the risk from your project. One way to avoid falling off a cliff is to avoid cliffs. Mitigating actions do not remove a risk completely, but they do serve to reduce it. Some mitigating actions reduce the probability of a risk event, such as inspecting your automobile tires before a long trip. Other mitigations reduce the risk consequences, such as wearing a seat belt to minimize injury. Neither of these actions prevents the problem, but they do serve to reduce the overall risk by lowering the “loss” or the “likelihood.”

Similarly, some damaging risks may be transferred to others. Many kinds of financial risks may be transferred to insurance companies; you can purchase coverage that will compensate your losses in the event of a casualty that is covered by the policy. Again, this does not remove the risk, but it does reduce the financial impact should the risk occur. Transfer of risk can deal with causes if the impact of the risk is primarily financial, but in other cases it may be used to deal with risk effects—aiding in the recovery.

Throughout most of this chapter the term “risk” will be used to describe an uncertain event that could harm the project—a threat. Not all uncertain project events are threats, however. There may also be uncertain opportunities where risk management strives to increase the probability or impact. Benefiting from these project opportunities involves embracing these “positive risk” situations. Similar tactics are applied to these uncertain opportunities, analogous (though reversed) to those just outlined for prevention of threats. Where you might avoid threats by replanning to remove the potential for harm, you would replan the project to exploit or to capture the opportunity. You work to make it a certain part of the project. In the case mentioned in Chapter 7 of an item that might go on sale, you might investigate the planned timing for the sale and schedule the project around it. Mitigation serves to reduce the probability or impact of a threat, and the corresponding tactic is to enhance the plan to pursue opportunity, making the potential benefits more likely or more helpful. In the case of the sale, you might be unable to determine when (or even if) it might occur, but you could schedule the project around the dates for a sale from last year on the theory that that is when such a sale would be most likely. As with threats, sometimes the strategy involves strength in numbers. Where threats may be transferred to limit their impact, opportunities may be improved when shared. Cost reductions for purchased items comparable to a sale might be available if you can find others with similar needs and make purchases together to take advantage of favorable quantity pricing.

Dealing with the effect of a threat may either be done in advance (contingency planning) or after the fact (acceptance). (Uncertain opportunities generally need no particular contingency planning; those not managed are ignored, or accepted.) Some risks are too minor or too expensive to consider preventing. For minor risks, acceptance may be appropriate; simply plan to deal with the consequences of the problem if and when it occurs. For more serious problems where avoidance, mitigation, and transfer are ineffective, impractical, or impossible, contingency planning is the best option.

For some risks, one of these ideas will be sufficient; for others, it may be necessary to use several.

Timeline for Known Risks

As was discussed briefly in Chapter 6, each activity risk will have a signal, perhaps more than one, indicating that the risk has crossed over from a possibility to a certainty. This signal, or trigger event, may be in advance of the risk or coincident with it. It may be visible to everyone involved in the project, or it may be subtle and hidden. For each risk, strive to define a trigger event that provides as much advance notification of the problem as possible. Consider the risk: “A key project team member quits.” One possible trigger event might be the submission of a resignation letter. This is an obvious trigger, but it is a late one. There are earlier triggers to watch for, such as a drop in motivation, erratic attendance, frequent “personal” telephone calls, or even an uncharacteristic improvement in grooming and dress. These triggers are not foolproof, and they require more attention and effort to monitor, but they may also foreshadow other problems even if the staff member does not intend to leave.

In addition to one or more trigger events, identify the portions of the project plan where the risk is most probable, being as precise as possible. For some risks there may be a single exposure related to one specific activity; more general risks (such as loss of key staff members) may occur throughout the project.

Risk management decisions and plans are made in advance of the trigger event, and they include all actions related to avoidance, mitigation, or transfer, as well as preparation for any contingent actions. Risk management responses that relate to recovery fall on the project timeline after the risk trigger, but are used only if necessary. For each significant risk that you cannot remove from the project, assign an owner to monitor for the trigger event and to be responsible for implementing the contingency plan or otherwise working toward recovery. The risk management timeline is summarized in Figure 8-3.

Figure 8-3. Risk management timeline.

Image

Dealing with Risk Causes

After each risk is categorized and you have identified those risks for which the project team could influence some or all of the causes, you are ready to begin developing response possibilities for prevention, including avoidance, mitigation, and transfer. Analyze all the options you and your team develop, examining both the cost of the idea and its potential benefits. If good, cost-effective ideas are proposed, the best of them are candidates for inclusion in your draft project plan. Prevention ideas must earn their way into the project plan. Even excellent ideas that completely remove a risk should be bypassed if their overall cost exceeds the expected “loss times likelihood” for the risk. The final process step is to integrate all accepted risk prevention ideas into your preliminary project plan and review the plan for new risks or unintended consequences as a result of the changes.

Planning for risk responses begins with generating ideas. Brainstorming with your project team is a good way to generate a range of possible choices. It is also useful to discuss risks with peers and others who may have relevant experience, and it may be worthwhile to consult experts and specialists for types for unfamiliar risks.

Few known risks are completely novel, so it is quite possible that many of the risks you face have been addressed on earlier projects. A quick review of project retrospective analyses, final reports, “lessons learned,” and other archived materials may provide information on what others did in response to similar risk situations they encountered. In addition to finding things that did not work and are worth avoiding, there may be useful ideas for effectively dealing with the risks you need to manage.

There are also many ideas available in the public domain, in papers, books, and articles and on the Web. References on project management, particularly those that are tailored to projects like yours, are filled with practical advice. Life cycles and project management methodologies also provide direction and useful ideas for managing risks.

A number of possible preventative actions follow in the next several pages, including tactics for risk avoidance, mitigation, and transfer. These can be useful in seeding a brainstorming exercise or in planning for specific responses. These tactics include ideas for dealing with the worst of the risks in the PERIL database, especially those characterized as “black swans.” The ideas listed here include some that may be appropriate only for particular kinds of technical projects, but many are useful for any project.

Risk Avoidance

Avoidance is the most thorough way to deal with risks, because it obliterates them. Unfortunately, avoidance is not possible for all project risks because some risks are tightly coupled to the requirements of technical projects. Avoiding risks in your project requires you to reconsider choices and decisions you made in defining and planning your project. Most of Chapters 3, 4, and 5 concerned using project planning processes to identify risks. Although some of the risks you discovered may be unavoidable, a review of the current state of your plan may turn up opportunities to replan the work in ways that remove specific serious risks. Tactics for avoiding scope risks suggested by the material in Chapter 3 include:

• Identify the minimum acceptable deliverable; avoid overdesign (“gold plating”).

• Negotiate and clearly document all interface deliverables expected from other projects.

• Avoid untried, unfamiliar, or “bleeding edge” technology whenever practical.

• Plan to design using standard, modular, or well-understood methods. Look for ways to achieve project specifications using older, tried-and-true technologies.

• Buy instead of make.

• Avoid “not invented here” thinking; be willing to leverage work done by others.

Many of your schedule risks are consequences of planning. You may be able to remove sources of schedule risk using ideas covered in Chapter 4:

• Reduce the number of critical paths.

• Modify the work to have fewer activity dependencies.

• Schedule the highest uncertainty activities as early as possible.

• Avoid having the same staff members working on two successive or concurrent critical (or near-critical) activities.

• Decompose lengthy activities further.

• Reschedule work to provide greater flexibility.

Resource risks may also be a consequence of choices you made in resource planning. Explore opportunities to avoid these risks using the concepts of Chapter 5:

• Obtain names for all required project roles.

• Get explicit availability commitments from all project staff (and from their managers).

• Work to limit commitments by project staff to other projects, maintenance and support work, and other time conflicts. Explicitly document those that remain.

• Modify plans to reduce the load on fully loaded or over-committed resources.

• Use the best people available for the most critical activities.

• Educate team members to use more efficient or faster methods, and do it early in the project.

• Use mentoring to build teamwork and establish redundancy for critical skills.

• Upgrade or replace older equipment to make work more efficient, and do it in the beginning of the project.

• Automate manual work when possible.

• Locate and gain access to experts to cover all skill areas not available on the project team.

• Minimize dependence on a single individual or other resource for project work.

• When you use outside services, use the same suppliers that you (or others that you trust) have used successfully in the past.

• Establish contract terms with all suppliers that are consistent with project objectives.

Avoidance tactics are not limited to these ideas by any means. Anything that you can realistically do to eliminate the root cause of a risk has potential for risk avoidance.

Risk Mitigation

Mitigation strategies are also essential for risk management, because avoidance can never deal with every significant project risk. Mitigation strategies serve to reduce the probability and/or the impact of potential problems. Some generic ideas for risk mitigation include:

• Good communication

• Using specialists and generalists

• Strong sponsorship

• Continuing user involvement

• Clear decision priorities

One of the least expensive and strongest preventative actions a project leader can take is to communicate more—and more effectively. Risks and risk consequences that are visible always affect the way that people work. If all the team members are aware how painful the project will become following a risk, they are likely to work, to the best of their ability, in ways that minimize the risk. Communication can significantly reduce risk probabilities. Communicate. Communicate. Communicate.

Another broad strategy for managing risk relates to project staffing. Difficult projects benefit from having a mix of specialists and generalists. Specialists are essential on technical projects because no one can know everything, and the specialist can generally complete assigned work in his or her specialty much faster than a generalist. However, a project team composed only of specialists is not very robust and tends to run into frequent trouble. This is because project planning on specialist-heavy projects is often intense and detailed for work in the specialists’ areas, and remarkably sketchy for other work. Also, such teams may lack broad problem-solving skills. Generalists on a project are needed to fill in the gaps and ensure that as much of the project work as possible is visible and well planned. Generalists are also best for solving cross-disciplinary problems. As the head generalist, the project manager should always reserve at least a small percentage of his or her time for problem solving, helping out on troubled activities, and general firefighting. Even when the project leader has a solid grasp of all the technical project issues, it is useful to have other generalists on the team in case several things on the project go wrong at the same time. Generalists can reduce the time to solution for problems of all kinds and minimize schedule impact.

Managing project risk is always easier with friends in high places. Establish and work to sustain strong sponsorship for your project. Although strong sponsorship does not ensure a risk-free project, weak (or no) upper-level sponsorship is a significant source of risk. Form a good working relationship with the project sponsor(s) and work to understand their expectations for project information. Reinforce the importance and value of the project regularly, and don’t let sponsors forget about you. Update your management frequently on project progress and challenges, and involve them early in problems and escalations that require authority you lack. Validate project objectives with sponsors and customers and work to set realistic expectations. Using your budget and staffing plans, get commitments for adequate funding, staffing, and expertise. Strong sponsorship reduces timing problems and other risk impact and lowers the probability for many kinds of resource risks.

Project risk will increase, particularly on lengthy projects, whenever the project team is disconnected from the ultimate customers for the deliverable. Establish and maintain contact with the end users, or with people who can represent them. Seek strong user buy-in, and work with users to avoid scope gaps by validating all acceptance and testing criteria. Establish measurable criteria, and determine what will be required for the users to deem the project a success. Identify the individual or individuals who will have the final word on this and keep in contact with them. The probability of scope risk and the likelihood of late project schedule difficulties are both reduced by meaningful user involvement.

A final general strategy for lowering project risk is setting clear decision priorities for the project. Validate the priorities with both the sponsors and the end users, and ensure that the project priorities are well known to the project team. Base project decisions on the priorities, and know the impact of failing to meet each priority established for the project. This not only helps manage scope risks, it also permits quick decisions within the project that minimize scope creep and other change-related impact.

Mitigation Strategies for Scope Risks

Mitigating scope and technical risks involves shifts in approach and potential changes to the project objective. Ideas for mitigating scope risks include:

• Explicitly specify project scope and all intermediate deliverables, in measurable, unambiguous terms, including what is not in the deliverable. Eliminate “wants” early—make them part of scope or drop them.

• Gain acceptance for and use a clear and consistent specification change management process.

• Build models, prototypes, and simulations.

• Test with users, early and often.

• Deal with scope risks promptly.

• Obtain funding for any required outside services.

• Translate, competently, all project documents into relevant languages.

• Minimize external dependency risks.

• Consider the impact of external and environmental problems.

• Keep all plans and documents current.

The most significant scope risks in the PERIL database are because of changes. Minimizing change risk involves the first two tactics—scope definition and change management. Scope risk is high for projects with inadequate specifications. Although it is true that thorough, clear definition of the deliverable is often difficult on technical projects, failure to define the results adequately leads to even greater difficulty. Closely inspect the list of features to be included to verify that all the requested requirements are in fact necessary.

The second necessary tactic for reducing change risk is to uniformly apply an effective process for managing all changes to project scope. To manage risks on large, complex projects, the process is generally formal, using forms, committees, and extensive written reporting. For technical projects done under contract, risk management also requires that the process be described in detail in the contract signed by the two parties. On smaller projects, even if it is less formal, there still must be uniform treatment of all proposed changes, considering both their benefits and expected costs. For your project, adopt a process that rejects all changes that fail the cost-justification test. It is not enough to have a change management process; mitigating scope risks requires its disciplined use.

Scope risks are often hard to evaluate at the beginning of technical projects. One way to gain better insight is to schedule work during planning to examine feasibility and functionality questions as early as possible. Use prototypes, simulations, and models to evaluate concepts with users. Schedule early tests and investigations to verify the feasibility of untried technology. Identify potential problems and defects early through walkthroughs and scenario discussions. Also consider scale risks. Even if there are no problems during small-scale, limited tests, scope risks may still remain that will be visible only in full-scale production. Plan for at least some rudimentary tests of functionality in full-scale operation as early in the project as practical. Schedule work to uncover issues and problems near the beginning of the project, and be prepared to make changes or even to abandon the project based on what you learn.

Although it is risky to defer difficult or unknown activities until late in the project, it may be impractical to begin with them. To get started, you may need to complete some simpler activities first, and then move on to more complicated activities as you build expertise. Do your best to schedule the risk-prone activities as early in your project as you can.

Lack of skills on the project team also increases scope risk, so define exactly how you intend to acquire all needed expertise. If you intend to use outside consultants, plan to spend both time and effort in their selection, and ensure that the necessary funding to pay for them is in the project budget. If you need to develop new skills on the project team, identify the individuals involved and plan so each contributor is trained, in advance, in all the needed competencies. If the project will use new tools or equipment, schedule installation and complete any needed training as early in the project as possible.

Scope problems also arise from faulty communications. If the project depends on a distributed team that speaks several languages, identify all the languages needed for project definition and planning documents and plan for their translation and distribution. Confusion arising from project requirements that are misinterpreted or poorly translated can be expensive and damaging, so verify that the project information has been clearly understood in discussions, using interpreters if necessary. It is also critical to provide written follow-up after meetings and telephone discussions.

Scope often depends on the quality and timely delivery of things the project receives from others. Mitigating these risks requires clear, carefully constructed specifications to minimize the possibility that the things that you get are consistent with the request but are inappropriate for the project’s intended use. If you have little experience with a provider, finding and using a second source in addition to the first may be prudent, even though this can increase the cost. The cost of a redundant source may be small compared to the cost of a delayed project.

External factors also lead to scope risks. Natural disasters such as floods, earthquakes, and storms, as well as not-so-natural disasters like computer viruses, may cause loss of critical information, software, or necessary components. Although there is no way to prevent the risks, provision for some redundancy, adequate frequent backups of computer systems, and less dependency on one particular location can minimize the impact for this sort of risk.

Finally, managing scope risk also requires tracking of the initial definition with any and all changes approved during the project. You can significantly lower scope risk by adopting a process that tightly couples all accepted changes to the planning process, as well as by making the consequences of scope decisions visible throughout the project.

Mitigation Strategies for Schedule Risks

Schedule risks may be minimized by making additional investments in planning and revising your project approach. Some ideas to consider include:

• Use “expected” estimates when worst cases are significant.

• Schedule highest priority work early.

• Schedule proactive notifications.

• Even if you must use new technology, explore how you might use older methods.

• Use parallel, redundant development.

• Send shipments early.

• Know customs requirements and use experienced services for international shipments.

• Be conservative in estimates for training and new hardware.

• Break projects with large staffs into parallel efforts.

• Partition long projects into a sequence of shorter ones.

• Schedule project reviews.

• Reschedule work coincident with known holidays and other time conflicts.

• Track progress with rigor and discipline and report status frequently.

The riskiest activities in the project tend to be the ones that have significant worst-case estimates. For any activity where the most-likely estimate is a lot lower than what could plausibly occur, calculate an “expected” duration using the Program Evaluation and Review Technique (PERT) formula. Use these estimates in project planning to provide some reserve for particularly risky work, and to reduce the schedule impact.

Project risk is lower when you schedule activities related to the highest priorities for the project as early as possible, moving activities of lower priority later in the project. For each scheduled activity, review the deliverables and specify how and when each will be used. Wherever possible, schedule the work so there is a time buffer between when each deliverable is complete and the start of the activities that require them. If there are any activities that produce deliverables that seem to be unnecessary, either validate their requirement with project stakeholders or remove the work from the project plan.

Many schedule risks are caused by delays that may be avoided through more proactive communication. Whenever decisions are needed, plan to remind the decision makers at least a week in advance and get commitment for a swift turnaround. If specialized equipment or access to limited services is required, put an activity in the plan to review your needs with the people involved somewhat before the scheduled work. If scarce equipment for some kinds of project work is a chronic problem, propose adding capacity to lower the risk on your project, as well as for all other parallel work. The preventative maintenance schedules for production systems are generally determined well in advance. Monitor availability schedules for needed services and synchronize your plans with them to reduce conflicts and delays.

New things—technology, hardware, systems, or software—are common sources of delay. Manage risk by seeking alternatives using older, known capabilities unless using the new technology is an absolute project requirement. A “lower-tech” alternative may in some cases be a better choice for the project anyway, or it could serve as a standby option if an emerging technology proves not to work. Identify what you would need to do or change in the project to complete your work without the newer technology.

One cause of significant delay is developing a specific design and then sending it out to be built or created before it can be tested. It may take weeks to get the tangible result of the design back, and if it has problems the entire cycle must be repeated, doubling the duration (or worse—it may not work the second time either). In areas such as chip design, more than one chip will be made on each wafer anyway, and it might be useful to design a number of slightly different versions that can all be fabricated at the same time. Most of the chips will be of the primary design, but other variations created at the same time can also be tested, thus increasing the chances of having a component that can be used to continue with project work. There are other cases where slightly different versions may be created in parallel, such as printed circuit boards, mechanical assemblies, and other newly designed hardware. Although this may increase the project cost, protecting the project schedule is often a much higher priority. Varying the parameters of a design and evaluating the results is also useful for quickly understanding the principles involved, which can reduce risks for future projects.

Delays due to shipping problems are significant on many projects and in many cases can be avoided simply by ordering or shipping items earlier in the project. Just because it is generally thought to take a week to ship a piece of equipment from San Jose, California, to Bangalore, India, does not mean you should wait until a week before it is needed in India to ship it. There are only two ways to get something done sooner—work faster or start earlier. With shipping, expediting may not always be effective, so it is prudent planning to request and send things that require physical transport well ahead of the need, particularly when it involves complex paperwork and international customs regulations. Use only shipping services with a good performance record, knowledge of legal requirements, and an ability to track shipments.

Similarly, delay may result from the need to have new equipment or new skills for the project. The time necessary to get new equipment installed and running or to master new skills may prove longer than you think. If you underestimate how long it will take, project work that depends on the new hardware or skills could have to wait. Planning proactively for these project requirements will remove many risks of this sort from your project (and, as mentioned earlier, it also lowers the chances that you might lose, or never get, the required funding). Estimate these activities conservatively, and schedule installations, upgrades, and training as early in your project as practical—well before they are needed.

Large projects are intrinsically risky. If a project requires more than twenty full-time staff members, explore the possibility of partitioning it into smaller projects responsible for subsystems, modules, or components that can be developed in parallel. However, when you decompose a large program into autonomous smaller projects, be sure to clearly define all interfaces between them both in terms of specifications required and timing. Although the independent projects will be easier to manage and less risky, the overall program could be prone to late integration problems without adequate systems-level planning and strong interface controls.

Long projects are also risky. Work to break projects longer than a year into phases that produce measurable outputs. A series of short evolutionary projects will create value sooner than a more ambitious longer project, and the shorter projects are more likely to fall within a reasonable planning horizon of less than six months. This is a central principle for evolutionary software development and agile methodologies, used to deliver intermediate results sooner and to manage risk.

If a lengthy project must be undertaken as a whole, adopt a “rolling-wave” planning philosophy. At the end of each project phase, plan the next phase in detail and adjust plans for the remainder of the work at a summary level. Make adjustments to the project plans for future phases as you proceed to reflect what has been learned in the previous phases, including changes to the project deliverable, shifts in project staffing, and other parameters of the project objective. Rolling-wave planning requires that the project team conduct a thorough project review at the end of each phase and be prepared to continue as planned, continue with changes, or abort the project.

Schedule risk also arises from time conflicts outside the project. Check the plan for critical project work that may conflict with holidays, the end of financial reporting periods, times when people are likely to take vacations, or other distractions. Verify that intermediate project objectives and milestones are consistent with the personal plans of the staff members responsible for the work. On global projects, collect data for each region to minimize problems that may arise when part of the project team will be unavailable because of local holidays. When there are known project time conflicts, minimize them by accelerating or delaying the planned work.

Finally, commit to rigorous activity tracking throughout the project, and periodically schedule time to review your entire plan: the estimates, risks, work flow, project assumptions, and other data. Publish accurate schedule status regularly.

Mitigation Strategies for Resource Risks

Mitigating resource risks includes ideas such as:

• Avoid planned overtime.

• Build teamwork and trust on the project team.

• Use “expected” cost estimates where worst-case activity costs are high.

• Obtain firm commitment for funding and staff.

• Keep customers involved.

• Anticipate staffing gaps.

• Minimize safety and health issues.

• Encourage team members to plan for their own risks.

• Delegate risky work to successful problem solvers.

• Rigorously manage outsourcing.

• Detect and address flaws in the project objective promptly.

• Rigorously track project resource use.

One of the most common avoidable resource risks on technical projects is required overtime. Starting a project with full knowledge that the deadline is not possible unless the team works overtime for much of the project’s duration is a prescription for failure. Whenever the plan shows requirements for effort in excess of what is realistically available, rework the plan to eliminate it. Even on well-planned projects there are always plenty of opportunities for people to stay late, work weekends and holidays, lose sleep, and otherwise devote time to the project from their side of the “work/life” balance. Projects that require overtime from the outset face significant risks of low productivity due to poor motivation and potential turnover.

Resource risk is lower on projects whenever motivation is high. Motivation is a key factor in whether people will voluntarily work overtime, and low motivation is frequently a root cause of many resource-related risks. Technical projects are always difficult. When they succeed, it is not because they are easy; it is because the project team cares about the project. Project leaders who are good at building teamwork and getting people working on the project to trust and care about each other are much more successful than project leaders who work impersonally at a distance.

Teamwork across cross-functional project boundaries is also important. The more involvement in project planning, start-up or launch activities, and other meaningful work with others you plan early in the project, the more team cohesion you can count on. People who know and trust each other will back each other up and help to solve each other’s problems. People who do not know each other well tend to mistrust each other and create conflict, arguments, and unnecessary project problems. Working together to plan and initiate project work transforms it from the “project leader’s project” to “our project.”

Financial risk is also significant for many projects. For activities in the project that have significant worst-case costs, estimate a realistic “expected cost” and use it to reflect the potential financial exposure and in determining the proposed project budget.

As with schedule risk, adequate sponsorship is essential to resource risk management. Get early commitment from the project’s sponsor for staffing and for funding, based on planning data (a discussion of negotiating for this follows in Chapter 10). The priority of the project is also under the control of the project sponsor, so work to understand the relative priority of the project in his or her mind. Strive to obtain the highest priority that is realistic for your project (and document it in writing). If the project has more than one sponsor, determine who has the most influence on the project. In particular, it is good to know who would be able to make a decision to cancel your project, so you can take good care of them and keep them aware of your progress. It is also useful to know who in the organization above you would suffer the most serious consequences if your project does not go well, because these managers have a personal stake in your project and they will likely be useful when risk recovery requires escalation.

Too little involvement of customers and end users in definition, design, and testing is also a potential resource risk, so obtain commitments early on all activities that require it. Also, plan to provide reminders to them in advance of the project work that needs their participation.

Risks resulting from staffing gaps can be reduced or detected earlier through more effective communication. Assess the likelihood that project staff (including you) might join the project late because of ongoing responsibilities in prior projects that are delayed. Get credible status reports from these projects and determine how likely it is that the people working on them will be available to work on your project. If the earlier projects are ending with a lot of stress and overtime, reflect the need for some recovery time and less aggressive estimates in your project plans for the affected team members. Also plan to notify any contributors with part-time responsibilities on your project in advance of their scheduled work.

Loss of project staff due to safety or health problems is always possible, so a review of activities involving dangerous work is a good idea. Modify plans for any activities that you suspect may have health or safety risks to minimize the exposure. You may be able to make changes to the environment, time, or place for the work or to mitigate the risk by modifying the practices used. Also consider the experience and skills of any contributor who might be exposed to risks, and work to replace or train any team members who have insufficient relevant background.

For any activity risk where the team members involved could contribute to the risk, involve the individuals in developing a response. In addition to potentially finding more, and better, ideas for prevention, this will tend to sensitize them to the impact of the problem and may greatly reduce the likelihood of the risk.

For new, challenging, or otherwise risky activities, strive to find experienced contributors who have a reputation for effective problem solving. Although you cannot plan creativity or innovation, you can identify people who seem to be good at it.

Outsourcing is a large and growing source of resource risk on projects. The discussion in Chapter 5 includes a number of exposures, and mitigating these risks requires discipline and effort. For each contract with a service provider that your project depends upon, designate a liaison on the project team to manage the relationship. Do this also for other project teams in your own organization that you need to work with. If you plan to be the liaison, ensure that you have sufficient time allocated for this in addition to all your other responsibilities. Involve the owner of each relationship in selection, negotiation, and finalization of the agreement. Ensure that the agreement is sufficiently formal (a contract with an external supplier, a “memo of understanding” or similar document for an internal supplier) and that it is specific as to both time and technical requirements for the work, consistent with your project plan. Provide incentives and penalties in the agreement when appropriate, and whenever possible, schedule the work to complete earlier than your absolute need.

With any project work performed outside the view of the project team, schedule reviews of early drafts of required documents. Also, participate in inspections and interim tests, and examine prototypes. Identify and take full advantage of any early opportunities to verify tangible evidence of progress. Plan to collect status information regularly, and work to establish a relationship that will make it more likely that you will get credible status, including bad news, throughout your project.

A significant risk situation on fee-for-service projects is a lack of involvement of the technical staff during the proposal and selling phases. When a project is scoped and a contract commitment is made before the project team has any involvement in the project, resource risks (not to mention schedule and scope risks) can be enormous. This “price to win the business” technique is far too common in selling fee-for-solution projects, and it often leads to fixed-price contracts with large and seemingly attractive revenues that are later discovered to involve even larger and extremely unattractive costs. Some projects sold this way may even be impossible to deliver at all. Prevention of this risk would be reasonably easy using time-travel technology, by turning back the clock and involving the project team in setting the terms and conditions for any agreement. Because that is impossible, and this risk may already be a certainty when the project team gets into project and risk planning, the only recourse is to mitigate the situation insofar as possible.

Minimizing the risks associated with committed projects based on little or no analysis requires the project team to initiate the processes of basic project and risk planning as quickly as they can, doing bottom-up planning based on the committed scope. Using best-effort planning information, uncover any expectations for timing and cost that are out of line with reality. Timing expectations are visible to all, so any shifts there must be dealt with internally as well as with the customer, which may require contract modifications. Resource and cost problems can be hidden from the customer, but they still will require internal adjustment and commitment to a realistic budget for the project, even if it significantly exceeds the amount that can be recovered under the contract. If this is all done quickly enough, before everyone has mentally settled into expectations based on the price to win the contract, it may even be possible to adjust the fees in the contract. Although it may be tempting to adopt a “safe so far” attitude and hope for the miracle that would allow project delivery consistent with the flawed contract, delay will nearly always make things worse. The last, best chance to set realistic expectations for such a project is within a few days of its start. After this, the situation becomes progressively uglier and more expensive to resolve.

It is also important to document and make these price-to-win situations visible, to minimize the chances of future recurrence. Organizations that chronically pursue business like this rarely last long.

Finally, establish resource metrics for the project, and track them against realistic planning data. Track progress, effort, and funding throughout the project, and plan to act quickly when the information shows that the trends show adverse variances against the plan. Keep resource status information visible through regular reporting.

Risk Transfer

Transfer is a third option for risk prevention, along with avoidance and mitigation. It is most effective for risks where the impact is primarily financial. The best-known form of transfer is insurance; for a fee, someone else will bear the financial consequences of your risk. Transfer works to benefit both parties, because the purchaser of the insurance avoids the risk of a potentially catastrophic monetary loss in exchange for paying a small (by comparison) premium, and the seller of the insurance benefits by aggregating the fees collected to manage the risk for a large population of insurance buyers, who may be expected to have a stable and predictable “average” risk, and include only a small percentage who will generate claims. In technical projects, this sort of transfer is not extremely common, but it is used. Unlike other strategies for mitigation, transfer does not actually do anything to lower the probability or diminish the nonfinancial impact of the risk. With transfer, the risk is accepted, and it either happens or it does not. However, any budgetary impact will be borne outside the project, limiting the resource impact.

Transfer of scope and technical risk is often the justification for outsourcing, and in some cases this might work. If the project team lacks a needed skill, hiring an expert or consultant to do the work transfers the activities to people who may be in a better position to get it done. Unfortunately, though, the risk does not actually transfer to the third party; the project still belongs to you, so any risk of nonperformance is ultimately still yours. Should things not go well, the fact that a bill for services will not need to be paid will be of small consolation. Even the possibility of eventual legal action is unlikely to help the project. Using outsourcing as a risk transfer strategy is very much a judgment call. In some cases the risks accepted may significantly exceed the risks managed, no matter how well you write the contract.

Implementing Preventative Ideas

Avoidance, mitigation, and transfer nearly always have costs, sometimes significant costs. Before you adopt any ideas to avoid or reduce risks, some analysis is in order. For each risk to be managed, estimate the expected consequences in quantitative terms. For each proposed risk response, assess the incremental costs and timing impact involved. After comparing this data, consider business-justified preventative actions for inclusion in the project plan.

The expected cost of a risk, as usual, is based on “loss times likelihood.” For this, you need the probability in numerical terms, as well as estimates of the risk impact in terms of financial, schedule, and possibly other factors.

For a risk that was assessed as “moderate” probability, the historical records may provide an estimated probability of about 15 percent. The impact of the risk must also be assessed quantitatively. For a risk that represents three weeks of schedule slip and $2 million in cost and a probability of 15 percent, the expected risk impact will be about one-half week (which is probably not too significant) and $300,000 (which would be, for most projects, very significant). In each case, this is 15 percent of the total impact, shown graphically in Figure 8-4.

The consequences of each idea for avoiding or mitigating the risk in time and money should be compared with the expected impact estimates to see whether they are cost-justified. If an idea only mitigates a risk—lowering the impact or probability of the problem—then the comparison is generally between the cost for mitigation and difference between the “before and after” estimates for the risk.

Figure 8-4. Expected impact.

Image

Determining whether a preventative is justified is always a judgment call, and it may be a difficult one. It is made more so because the data is often not very precise or dependable, making comparisons fairly subjective. The exercise of comparing costs for risk prevention with the expected impact is important, though, because it is human nature to attempt to prevent problems whenever possible. Just because you could prevent a risk, though, does not necessarily mean that you should. Seeking a risk-free project is illogical for two reasons. First, it is impossible. All projects have some residual risk no matter how much you do to avoid it. Second, a project with every possible risk prevention idea built into the plan will be far too expensive and time consuming to ever get off the ground.

For each potential idea that reduces or removes a project risk, contrast the expected costs of the risk with the cost of prevention before pulling it into the project plan. In the case above, with the expected half week of delay and $300,000 in expense, an idea that requires a week of effort and costs $1.5 million would most likely not be adopted, as the “cure” is nearly as bad as the relatively unlikely risk. This situation would be similar to paying more for insurance than the cost of the expected loss. A preventative that costs less and requires little effort, though, may well represent a prudent plan modification.

Another consideration may enter the decision process. You may choose to respond to some risks that have high impact even though they have low assessed probabilities and “expected” consequences that position them below your “cut line” on the rank-ordered list. A decision to manage the risk outlined earlier will also need to consider whether a $2 million unanticipated expense could be tolerated. The incremental cost of the risk will never be $300,000; it will be either nothing or $2 million. If a $2 million outlay is not acceptable, a “MiniMax” strategy would lead you to invest in a risk response if you can identify one that is effective and can be accommodated in your budget.

What appears to be a simple decision, then, may not be. You may choose to develop a response for risks in your risk register for any of the following reasons:

• They are significant risks for which you have a cost-effective response.

• They are risks with high impact where a response is justified, regardless of assessed probability. (Remember, black swans do happen.)

• They are minor risks—those below the cut line—that have simple, low-cost, effective responses.

You may choose not to respond to risks in your risk register (to accept them) for any of the following reasons:

• They are significant risks where no response can be found.

• They are significant risks where a response is identified but thought too costly.

• They are minor risks that do not warrant attention in advance.

Even if some of the ideas you generate for risk prevention prove not to be cost-justified, the same (or similar) approaches may still have application as contingency plans.

Updating your plans is the final step in risk response planning. For each cost-justified (or otherwise approved) risk avoidance, mitigation, or transfer idea, you must update your project planning documents. Most ideas will require additional or different work, so the project work breakdown structure (WBS) may shift, and there will likely be revisions to activity effort and duration estimates. Any added work will require staffing, and so the profiles in your resource plan will also require changes. If the resulting plan has problems meeting existing project constraints, there will be additional required replanning, which may create new risks.

Before adoption, each idea for risk prevention must earn its way into the project by lowering, not increasing, project risk. Before any modifications, review the plan for unintended consequences and document the justification for all additional project work.

Contingency Planning

For some risks, your best strategy will be to deal with risk effects, not causes. Avoidance, mitigation, and transfer, when justified and added to the project, all serve to make a project less risky, but risks will inevitably remain. For some risks, you have no influence on the root causes or can find no preventative action that was cost effective. For other risks, you may have mitigation strategies that help but still leave substantial residual risk. For most of the significant risks that remain, you should develop contingency plans, although for some cases you may decide to accept the risk.

Contingency planning deals with risk effects by generating plans for recovery or “fall back.” The process for contingency planning is entirely the same as for any other project planning, and it should be conducted at the same level of detail and using the same methodologies and tools as other project planning.

Each contingency plan begins with the trigger event that signals the risk has occurred. The most effective risk triggers precede the risk consequences by as much as possible. Early triggers increase the number of potential recovery options, and in some cases they may permit you to reduce the impact of the risk, so verify that the trigger event you plan to use is the best option available.

Each risk to be managed with a contingency plan also must also have an owner. The risk owner should be involved with developing the initial contingency plan and will monitor for the trigger event and be responsible for maintaining the contingency plans. If the risk should occur, the risk owner will be responsible for beginning to execute the contingency plan, working toward project recovery. The owner of a project risk will most often be the same person who owns the project activity related to the risk, but for risks with particularly severe, project-threatening consequences, the project leader may be a better choice.

General Contingency Planning Strategies

Contingency planning for risks often starts with leftover ideas. Some ideas may have been considered for schedule compression (discussed in Chapter 6), but were not used. Others could be risk prevention strategies that were not adopted in the preliminary baseline plan for cost or other reasons. Although some of these ideas may be simply adopted as contingency plans without modification, in other cases they may need to be modified for “after the fact” use. Prevention strategies such as using an alternate source for components or schedule compression strategies for expediting printing or other outsourced activities can be documented as contingency plans with no modification. Some risk avoidance ideas can serve as contingencies after minor changes. Dropping back to an older technology, for example, might require additional work to back out any dependencies on a newer technology that fails.

Contingency planning in itself is a powerful risk prevention tool, as the process of planning for recovery shows clearly how difficult and time consuming it will be to recover from problems. This provides additional incentive for the project team to work in ways that will avoid risks. Always strive to make risks and risk planning as visible as possible in project communication. Your project team can only work to avoid the potential problems that they know about.

Contingency Planning Strategies for Schedule Risks

Whenever a risk results in a significant delay, the contingency plan must seek an alternate version of the work flow that provides either a way to expedite work so you can resume the project plan at some later point or a way to complete the project on an alternate basis that minimizes impact to the project deadline.

Recovery involves the same concepts and ideas used for schedule compression, discussed in Chapter 6. The baseline plan will require revision to make effort available for recovery immediately following the risk, so other work will need to be shifted, changed, or eliminated. You may be able to delay the start of less crucial planned activities, postponing them to later in the project. Any noncritical activity work that is simultaneous to or scheduled to follow the risk event may be interrupted or postponed to allow more focus on recovery. Some activity dependencies may be revised to allow project activities to be done out of the planned sequence, freeing contributors to work on recovery. In all of these cases, necessary activities shift later in the schedule, increasing the impact of future risks and creating new failure modes and exposures as more and more project work becomes schedule critical.

It may even be possible to eliminate planned work if it is nonessential, or to devise quicker approaches for project activities that could obtain similar, but possibly less satisfactory, results. In some cases, it may be possible to defer these decisions to eliminate work or adopt “shortcuts” until later in the project, using them on an as-needed basis.

“Crashing” project activities scheduled for later in the project to decrease their duration can also help if the project has sufficient budget reserve or access to the additional staffing. Shorter durations will permit later start dates for scheduled work and potentially free up project effort for recovery. Simply adding staff to the project to work on recovery may also be an option, if you can get commitment from additional contributors. If you do plan to add people, include all training and project familiarization required as part of your baseline plan to minimize the disruption inevitable with new staff. Without adequate preparation, this tactic might delay your project even more.

It may not be possible to replan the project to protect the deadline, especially when the risk relates to work near the project deadline. In such a case, the contingency planning serves to minimize the slippage and to provide the data necessary to document a new, later completion date.

A generic schedule contingency strategy involves establishing schedule reserve for the project. Establishing schedule reserve is explored in more detail in Chapter 10.

Contingency Planning Strategies for Resource Risks

For risks that require significant additional resources, contingency planning involves revising the resource plans to protect the project budget, or at least to limit the damage. Again, the process for this parallels the discussion for dealing with resource constraints in Chapter 6.

The most common strategy is also one of the least attractive—working overtime and on weekends and holidays. This tried-and-true recovery method works adequately on most projects, providing the resource impact is minimal and project staffing is not already working significantly beyond the normal workday and workweek. If the amount of additional effort required is high, or the project team is stretched too thin when the risk occurs, this contingency strategy may backfire and actually make things worse by lowering motivation and leading to higher staff turnover.

For some projects, there may be contributors who are assigned to the project but are underused during part of it. If this is the case, shifting work around in the schedule may allow them to assist with risk recovery and still effectively meet other commitments. This tactic, like dealing with schedule risks using float, tends to increase overall project risk later in the project.

Eliminating later work or substituting other approaches than those planned may also reduce the resources needed for work later in the project, but if this is possible it is generally more appropriate to do it as part of the baseline plan. If the work is not essential, or there is a quicker way to obtain an acceptable result, these choices ought to be adopted, not viewed as potential jetsam to fling overboard if necessary.

Particularly for resource risks, it may be impossible to avoid damage to the overall resource plan and budget. All adverse variances increase the total project cost, so there may be few or no easy ways left to cut back other expenses to compensate.

Minimizing the impact of risk recovery involves contingency planning that revises resource use in ways that protect the budget as much as possible. Tactics such as assigning additional staff to later critical path activities or “borrowing” people from other, lower-priority projects may have little budget impact. Expediting external activities using incentive payments and outsourcing work planned for the project team may also be possible, but seek approval in advance for the additional cost as part of your contingency planning. If a contingency plan requires any training or other preliminary work to be effective, make these activities part of your baseline project plan.

A generic resource contingency strategy involves establishing a budget reserve for the project, similar to the schedule reserve discussed earlier. Budget reserve is discussed further in Chapter 10.

Contingency Planning Strategies for Scope Risks

Contingency planning for scope risks is not too complicated. The plans involve either protecting the specifications for the deliverable or reducing the scope requirements. Attempting to preserve the requirements is done by adding more work to the schedule (using tactics summarized previously), using additional resources, or both. In most cases it is difficult to assess in advance the magnitude of change that this may require, as the level of difficulty in fulfilling requirements for technical projects is highly variable—from relatively trivial in some cases to impossible in others. Contingency plans for scope risks usually provide for some level of recovery effort, followed by a review to determine whether to continue, modify the scope, or cancel the project.

For many technical projects, scope risks are managed by modifying the project objective, to provide most of the value of the project deliverable in a way that is consistent with schedule and resource objectives. The process for this, similar to that discussed in Chapter 6, starts with a prioritized list of specifications. It may be possible to drop some of the requirements entirely, or to defer them to a later phase or project. There may also be potential for relaxing some of the requirements, making them easier to achieve. Although this can be done effectively for some projects in advance, contingency planning for scope risks generally includes a review of project accomplishments and any shifts in assumptions, so your decisions on what to drop will be based on current data.

Risk Acceptance

For some risks, it may not be possible, or worthwhile, to plan specifically for recovery. Acceptance, as a general risk management technique, includes both transfer and contingency planning, because in both of these situations the risk causes are not influenced and the risk either happens or does not. For transfer and for contingency planning, specific responses are planned in advance to assist in recovery. For some risks, though, neither of these options may be practical. When the consequences of a risk are sufficiently unclear, as may be the case for scope and some other risks in technical projects, planning for recovery in advance may be impossible. An example of this might be a stated requirement to use new technology or hardware for the project. In such a case, many potential problems, ranging from the trivial to the insurmountable, are possible.

When a specific risk response is not an option, there are still choices available. If the risk is sufficiently serious, it may be the best course to abandon the project altogether as too risky, or consider a major change in the objective. For situations that are less damaging, you may choose to proceed with the project having no specific risk response, accepting the risks (and hoping for the best). If you adopt this alternative, it is prudent to document the risks as thoroughly as possible, discuss them with your sponsor and stakeholders, and secure project-level schedule and budget reserves to assist in managing the accepted risks.

Documenting Your Risk Plans

For risks with multiple potential consequences or particularly severe effects, you may want to generate more than one contingency plan. Before finalizing a contingency plan (or plans), review them for overall cost and probable effectiveness. If you do develop more than one response for a risk, prioritize the plans, putting first the plan you think will be most effective.

Document all contingency plans, and include the same level of detail as in the project plans: WBS, estimates, dependencies, schedule, resources required, the expected project impact, and any relevant assumptions. For each risk response plan, clearly specify the trigger event to detect that the risk has happened. Also, include the name of the owner who will monitor the risk trigger, maintain the contingency plan, and be responsible for its execution if the risk occurs.

As part of the overall project documentation, document your risk response plan and work to make the risks visible. One method for increasing risk awareness is to post a “top ten” risk list (revised periodically) either on the project Web site or with posters on the walls of project work areas. Ensure adequate distribution and storage of all risk plans, and plan to review risk management information at least quarterly.

Some projects formally maintain the risk register as part of their risk response plan. For each managed risk, the register includes:

• A detailed description of the risk

• The risk owner, plus any others with assigned roles and responsibilities

• The activities affected by the risk (including WBS codes)

• Any qualitative or quantitative risk analysis results

• A summary of risk response actions in the project plan

• The risk trigger event

• Expected residual risk exposure

• A summary of contingency and fallback plans

Add risk plans to the other project documentation and choose an appropriate location for storage that is available to all project contributors and stakeholders.

Managing a Specific Risk

Some years ago, a large multinational company initiated a year-long effort to establish a new European headquarters. Growth over the years had spread people, computers, and other hardware all over Geneva, Switzerland, and the inconvenience and expense for all of this had grown unacceptable. The goal was to consolidate all the people and infrastructure into a modern, new headquarters building. This effort involved a number of high-profile, risky projects, and I was asked to manage one of them.

One particularly risky aspect of the project involved moving two large, water-cooled mainframe computers out of the older data center where the systems had operated for some years, and into a more modern center in the new headquarters building. In the new location, the systems would be collocated with all the other headquarters computers and the telecommunications equipment that tied them to other sites in Europe and around the world. Both systems were critical to the business, so each was scheduled to be moved over a three-day holiday weekend. It was essential that each system be fully functional in the old data center at the end of the week before the move, and fully functional in the new data center before the start of business following the holiday, three days later.

Most of the risks were fairly mundane, and they were managed through thorough planning, adequate staffing, and extensive training, all committed months in advance. Other precautions, such as additional data backups, were also taken. The move itself was far from mundane, though, because the old data center, for some reason, had been established on the fifth floor of a fairly old building. The elevator in the building was small, about one meter square, and could carry no more than the weight of three or four people (who had to be on very friendly terms). When the systems were originally moved into the building, a system-sized door had been cut into the marble façade of the building, and a crane with a suspended box was used to move the systems into the data center. Over the years, upgrades and replacements had been moved in and out the same way.

Up to the time of this project, only older hardware being replaced had ever been moved out of the data center this way. In these cases, if there had been a mishap it would not have affected operations, because the older systems were only moved out once the replacement systems were successfully moved in and operational. For the relocation project, this was not the case. Both systems had to be moved out, transported, and reinstalled successfully, and any problem that started twenty meters in the air would result in a significant and expensive service interruption far longer than the allocated three days.

The new data center was, sensibly, at ground level; eliminating the need to suspend multimillion-dollar mainframes high in the air was one of the reasons the project was undertaken. Successful completion of the project would mean ground-level systems in the new data center, and far easier maintenance for all future operations.

In addition to the obvious risk of a CPU plummeting to the ground, the short timing of the project also involved other exposures such as weather, wind, traffic, injuries to workers, problems with the crane, and many other potential difficulties. The assessment of risk for most of these situations resulted either in adjustments in staffing, shifts in the plan, or acceptance, because there was sufficient experience and people were confident that most of the potential problems could be managed during the move.

The one remaining risk that concerned all of us was that one of the mainframe computers might smash into the sidewalk. The consequences of this could not be managed during the three-day weekend, so a lot of analysis went into exploring ways to manage this risk.

Risk assessment was the subject of significant debate, particularly with regard to probability. Some thought it “low,” citing, “This is Switzerland; we move skiers up the mountains this way all the time.”

Image

Suspending computing in Geneva.

Others, particularly people from the United States, were less optimistic. In the end, the consensus was “moderate.” There was less debate on risk impact, which in this case was literal. In addition to issues of cost and delay, there were significant other concerns such as safety, the large crater in the pavement, noise, and computer parts bouncing for blocks around.

The primary impact was in time and cost, and deemed “high,” so considerable planning went into mitigating the risk. A number of ideas were explored, including disassembly of the system for movement in pieces using the elevator, building a lift along the side of the building (the two systems were to be moved a month apart, so this cost would have covered both), using padding or some sort of cushion for the ground, and a number of other even less practical ideas. The disassembly idea was considered seriously, but deemed inappropriate because of timing and the discouraging report from the vendor that “those systems do not always work right initially when we assemble them in the factory.” The external lift idea was a good one, but hardware that could reach to the fifth floor was unavailable. A large net or cushion would have minimized the spread of debris, but seemed unlikely to ensure system operation. It was not until the problem was reframed that the best idea emerged. The risk was not really the loss of that particular system; it was the loss of a usable system.

A plan to purchase a new system and install it, in advance, in the new data center would make the swift and successful move of the existing hardware unnecessary. Once operations were transferred to the new hardware, the old system could be lowered to the street, and if successful, sold as used equipment. This was an effective plan for avoiding the risk, but it had one problem—cost. The difference between the salvage value of the current machine and the purchase price of a new one was roughly $2 million. This investment was far higher than the expected consequences of the risk, so it was rejected as part of the plan. We decided to take as many precautions as possible, and accept the risk.

All this investigation made the contingency planning easy, as the research we had done into acquiring a new system was really all that was necessary. We ordered a new system and got a commitment from the vendor to fill the order with the next machine built if there were any problems moving the existing system. (The vendor was happy to agree to this, as it was heavily involved in many aspects of the relocation.) Once the move had been competed successfully, the order could be canceled with no penalty.

The consequences documented for the contingency plan were that the system would be unavailable for about three weeks, and the cost of the replacement system would be roughly $3 million.

As it happened, the same staff and basic plan was employed for both mainframe moves, and both went without any incident. Although the contingency plan was not used, everyone felt that the risk planning had been a good investment. The process revealed clearly what we were facing, and it heightened our awareness of the overall risk. It uncovered many related smaller problems that were eliminated, which saved time and made the time-critical work required much easier. It also made all of us confident that the projects had been carefully and thoroughly planned, and that we would be successful. Even when risk management cannot eliminate all the risks, it is worthwhile to the project.

Panama Canal: Risk Plans (1906–1914)

Risk management represented one of the largest investments for the Panama Canal project. Of the risks mentioned in Chapter 7, most were dealt with in effective, and in several cases innovative, ways.

The risk of disease, so devastating on the earlier project, was managed through diligence, science, and sanitation. The scale and cost of this effort were significant, but so were the results. Widespread use of methods for mosquito control under the guidance of Dr. William Gorgas was effective on a scale never seen before. Specific tactics used, such as frequently applying thin films of oil on bodies of water and the disciplined dumping of standing water wherever it gathered (which in a rain forest was nearly everywhere), were so effective that their use worldwide in the tropics continues to this day. Once the program for insect control was in full effect, Panama was by far the healthiest place anywhere in the tropics. Yellow fever was eliminated. Malaria was rare, as were tuberculosis, dysentery, pneumonia, and a wide range of other diseases common at the time. Not only were the diseases spread by mosquitoes virtually eliminated, work also went much faster without the annoyance of the omnipresent insects. Although some estimates put the cost at US$10 for every mosquito killed, the success of the canal project depended heavily on Dr. Gorgas to ensure that the workers stayed healthy. This risk was managed thoroughly and well.

For the risk of frequent and sudden mud slides, there were no elegant solutions. As the work commenced, it seemed to many that “the more we dug, the more remained to be dug.” Unfortunately, this was true; it proved impossible to use the original French plan for the trench in the Culebra Cut to have sides at 45 degrees (a 1:1 slope). This angle created several problems, the largest of which was the frequent mud slides. In addition, the sides of the cut pressed down on the semisolid clay the excavators were attempting to remove, which squeezed it up in the center of the trench. The deeper the digging, the more the sides would sink and the center would rise; like a fluid, it would seek its level. The contingency plan was inelegant but ultimately effective—more digging. The completed canal had an average 4:1 slope, which minimized the mud slides and partially stabilized the flowing clay. This brute-force contingency plan not only resulted in much more soil to dispose of, it represented about triple the work. Erosion, flowing clay, and occasional mud slides continue to this day, and the canal requires frequent dredging to remain operational.

Dealing with the risks involved with building the enormous locks required a number of tactics. As with the mud slides, the massive concrete sides for the locks were handled by brute force and overengineering. Cement was poured at Panama on a scale never done before. The sides of the locks are so thick and so heavily reinforced that even after close to a century of continuous operation, with thousands of ship passages and countless earthquakes, the locks still look much as they did when they were new.

The mechanical and electrical challenges were quite another matter. The locks were colossal machines with thousands of moving parts, many huge. Years of advance planning and experimentation led to ultimate success. The canal was a triumph of precision engineering and use of new steels. Vanadium alloy steels used were developed initially for automotive use, and they proved light and strong enough to serve in the construction of doors for the locks. Holding the doors tightly closed against the weight of the water in a filled lock required a lot of mass, mass that the engineers wanted to avoid moving each time the doors were opened or closed. To achieve this, the doors are hollow. Whenever they are closed, they are filled with water before the lock is filled, providing the necessary mass. The doors are then drained before they are opened to allow the ships raised (or lowered) to pass through.

Even with this strategy, moving doors of this size and weight required the power of modern engines. The choice of electrical operation was complicated and required much innovation (the first all-electric factory in the United States was barely a year old at the time of this decision), but electricity did provide a number of advantages. With electric controls, the entire canal system can be controlled centrally. Scale models were built to show the positions of each lock in detail. The lock systems are all controlled using valves and switches on the model, and mechanical interlocks beneath the model prevent errors in operation, such as opening the doors on the wrong end of a lock, or opening them before the filling or draining of water is complete. Complete status can be monitored for all twelve locks.

When George Goethals began to set all of this up, he realized that neither he nor anyone else had ever done anything like it. For most of the controls and the 1000+ electric motors the canal required, Goethals managed risk by bringing in outside help. He awarded a sizable contract to a rapidly growing U.S. company known for its expertise in electrical systems. Although it was still fairly small and not known internationally, the General Electric (GE) Company had started to attract worldwide attention by the time the Panama Canal opened. This was a huge contract for GE, and it was the company’s first large government contract. Such a large-scale collaboration of private and public organizations was unknown prior to this project. The relationship used by Goethals and GE served as the model for the Manhattan Project during World War II and for countless other modern projects in the United States and elsewhere. For good or ill, the modern military-industrial complex began in Panama.

Despite the project’s success in dealing with most risks, explosives remained a significant problem throughout construction. As in many contemporary projects, loss of life and limbs while handling explosives was common. Although stringent safety precautions helped, the single largest cause of death on the second Panama Canal project was TNT, not disease. For this risk, the builders found no solutions or viable alternatives, so throughout the project they were quite literally “playing with dynamite.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.27.171