8
GOAL CONFLICTS

STRATEGIC FACTORS

A third set of factors at work in distributed cognitive systems is strategic in nature. People have to make tradeoffs between different but interacting or conflicting goals, between values or costs placed on different possible outcomes or courses of action, or between the risks of different errors (Brown, 2005a, 2005b; Woods, 2006; Hollnagel, 2009). They must make these tradeoffs while facing uncertainty, risk, and the pressure of limited resources (e.g., time pressure; opportunity costs).

CASE 8.1 BUSY WEEKEND OPERATING SCHEDULE

On a weekend in a large tertiary care hospital, the anesthesiology team (consisting of four physicians, three of whom are residents in training) was called on to perform anesthetics for an in vitro fertilization, a perforated viscus, reconstruction of an artery of the leg, and an appendectomy, in one building, and one exploratory laparotomy in another building. Each of these cases was an emergency, that is, a case that cannot be delayed for the regular daily operating room schedule. The exact sequence in which the cases were done depended on multiple factors. The situation was complicated by a demanding nurse who insisted that the exploratory laparotomy be done ahead of other cases. The nurse was responsible only for that single case; the operating room nurses and technicians for that case could not leave the hospital until the case had been completed.

The surgeons complained that they were being delayed and their cases were increasing in urgency because of the passage of time. There were also some delays in preoperative preparation of some of the patients for surgery. In the primary operating room suites, the staff of nurses and technicians were only able to run two operating rooms simultaneously. The anesthesiologist in charge was under pressure to attempt to overlap portions of procedures by starting one case as another was finishing so as to use the available resources maximally. The hospital also served as a major trauma center which means that the team needed to be able to start a large emergency case with minimal (less than 10 minutes) notice. In committing all of the residents to doing the waiting cases, the anesthesiologist in charge produced a situation in which there were no anesthetists available to start a major trauma case. There were no trauma cases and all the surgeries were accomplished. The situation was so common in the institution that it was regarded by many as typical rather than exceptional.

In this incident, the anesthesiologist in charge committed all of his available resources, including himself, to doing anesthesia. This effectively eliminated the in-charge person’s ability to act as a buffer or extra resource for handling an additional trauma case or a request from the floor. In the institution where the incident occurred, the anesthetist in charge on evenings and weekends determines which cases will start and which ones will wait. Being in charge also entails handling a variety of emergent situations in the hospital. These include calls to intubate patients on the floors, requests for pain control, and handling new trauma cases. The anesthesiologist in charge also serves as a backup resource for the operations in progress, sometimes described as an “extra pair of hands”. For a brief period of time, the system was saturated with work. There were no excess resources to apply to a new emergency. The anesthesiologist in charge resolved the conflict between the demand for use of all the resources at his command and the demand for preservation of resources for use in some emergency in favor of using all the resources to get urgent work completed. This was a gamble, a bet that the work at hand would continue to be manageable until enough was completed to restore some resource excess that would provide a margin for handling unforeseen (and unforeseeable) emergencies. These factors were not severe or particularly unusual. Rather, they represented the normal functioning of a large urban hospital. The decision to proceed along one pathway rather than another reflects on the way that the practitioner handled the strategic factors associated with his work that day.

One remarkable aspect of this incident is that it was regarded as unremarkable by the participants. These kinds of scheduling issues recur and are considered by many to be simply part of the job. There were strong incentives to commit the resources, but also incentives to avoid that commitment. Factors that played a role in the anesthetist’s decision to commit all available resources included the relatively high urgency of the cases, the absence of a trauma alert (indication that a trauma patient was in route to the hospital), the time of day (fairly early; most trauma is seen in the late evening or early morning hours), and the pressure from surgeons and nurses.

Another reason for committing the resources seems almost paradoxical. The practitioner needed to commit resources in order to free them for work. By completing cases as early as possible in the day, he was providing resources for use in the late evening, when trauma operations seemed more likely. The resources at hand were available but evanescent; there was no way to ‘bank’ them against some future need. Refusing to use them now did not assure that they would be available in the future. Rather this would actually reduce the likelihood that they would become available. Indeed, it seemed clear to the practitioner that it was desirable to get things done in order to be prepared for the future. The closer circumstances came to saturating the group’s ability to handle cases, the greater was urgency to complete cases already in progress or pending in the near future.

One can argue (and there were discussions amongst practitioners at the time) that committing all the resources was a risky course. It is not the ultimate resolution of this debate that matters here. Rather what this incident shows is the way that the domain of practice confronts practitioners with the need to act decisively in the midst of competing demands and in an environment rife with uncertainty. The incident demonstrates not that the practitioner was either a risk taker or risk averse. There is generally no risk free way to proceed in these sorts of domains. Rather, all the possible ways of proceeding involve exposure to different sets of risks. The decision to proceed in one way or some other way is the result of coping with multiple, conflicting goals and demands, trading off aspects and elements of one against elements and aspects of the other.

This incident typifies the goal interactions and dilemmas that arise in cognitive work. These are the ways in which people accommodate a variety of problematic issues: the different values of possible outcomes; the implications of the costs of the different courses of action; the consequences of various kinds of possible failure. Coping with these issues is fundamentally a strategic process, one that involves assessments and comparisons of different possible actions. It is especially relevant that these assessments and comparisons take place under time pressure and in the face of uncertainty. The central theme of this chapter is the role of dilemmas and goal conflicts in complex system failures. Focusing on the multiple goals that play out in sharp end practice inevitably force us to look outward to the larger organizational contexts which shapes the nature of practice.

MULTIPLE, CONFLICTING GOALS

Multiple, simultaneously active goals are the rule rather than the exception for virtually all domains in which expertise is involved. Practitioners must cope with the presence of multiple goals, shifting between them, weighing them, choosing to pursue some rather than others, abandoning one, embracing another. Many of the goals encountered in practice are implicit and unstated. Goals often conflict. Sometimes these conflicts are easily resolved in favor of one or another goal, sometimes they are not. Sometimes the conflicts are direct and irreducible, for example when achieving one goal necessarily precludes achieving another one. But there are also intermediate situations, where several goals may be partially satisfied simultaneously. An adequate analysis of real world incidents requires explicit description of the interacting goals and the ways in which practitioners assessed and compared them.

Perhaps the most common hazard in the analysis of incidents is the naÏve assessment of the strategic issues that confront practitioners. It is easy, especially after accidents, to simplify the situation facing practitioners in ways that ignore the real pressures and demands placed on them, that is to attempt to resolve the goal conflicts that actually exist by the convenient fiat of discarding some of them as insignificant or immaterial.

Some goal conflicts are inherently technical; they arise from intrinsic technical nature of a domain. In the case of the anesthetized patient, a high blood pressure works to push blood through the coronary arteries and improve oxygen supply to the heart muscle but, because increased blood pressure adds to cardiac work, a low blood pressure should reduce cardiac work. The appropriate blood pressure target adopted by the anesthetist depends in part on the practitioner’s strategy, the nature of the patient, the kind of surgical procedure, the circumstances within the case that may change (e.g., the risk of major bleeding), and the negotiations between different people in the operating room team (e.g., the surgeon who would like the blood pressure kept low to limit the blood loss at the surgical site). This is a simple example because the practitioner is faced with a continuum of possible blood pressures. It is easy to imagine a momentary “optimal” blood pressure in between the extremes in the attempt to manage both goals.

But there are parallel examples of conflicting goals that are not so clearly technical in character (Brown, 2005a). For example, a very high level goal (and the one most often explicitly acknowledged) in anesthesiology is to preserve the patient’s life. If this were the only active goal, then we might expect the practitioner to behave in a certain way, studiously avoiding all risks, refraining from anything that might constitute, even in hindsight, exposure to hazard. But this is not the only goal; there are others. These include reducing costs, avoiding actions that would lead to lawsuits, maintaining good relations with the surgical service, maintaining resource elasticity to allow for handling unexpected emergencies, providing pain relief, and many others. Incidents and accidents have a way of making some goals appear to have been crucial and others irrelevant or trivial, but this hindsight actually blinds us to the relevance of goals before the event. Our knowledge of outcome makes it impossible for us to weigh the various goals as if we were the practitioners embedded in the situation. But in order to understand the nature of behavior, it is essential to bring these various goals into focus. The incident that begins this chapter shows how sterile and uninformative it is to cast the situation confronting a practitioner as dominated by a single, simple goal. Preserving the patient’s life is indeed a goal, and a high level one, but this practitioner is dealing with multiple patients – even including hypothetical ones, such as the potential trauma patient. How is he to assess the relative acuity and hazard of the different situations with which he must cope?

Individual goals are not the source of the conflicts; conflicts arise from the relationships between different goals. In the daily routine, for example, the goal of discovering important medical conditions with anesthetic implications before the day of surgery may drive the practitioner to seek more information about the patient. Each hint of a potentially problematic condition provides an incentive for further tests that incur costs (e.g., the dollar cost of the tests, the lost opportunity cost when a surgical procedure is canceled and the operating room goes unused for that time, the social costs disgruntled surgeons). The goal of minimizing costs, in contrast, provides an incentive for a minimal preoperative testing and the use of same-day surgery. The external pressures for highly efficient performance are in favor of limiting the preoperative evaluation (Woods, 2006). But failing to acquire information may reduce the ability of the anesthesiologist to be prepared for events during surgery and contribute to the evolution of a disaster.

To take another example of conflicting goals, consider the in-flight modification of flight plans in commercial aviation. At first glance it seems a simple thing to conclude that flight plans should be modified in the interests of “safety” whenever the weather is bad on the path ahead. There are, however, some goals that need to be included in the decision to modify the plan. Avoiding passenger discomfort and risk of injury from turbulence are goals that are, in this case, synonymous with “safety”. But there are other goals active at that same time including the need to minimize fuel expenditure, and the need to minimize the difference between the scheduled arrival time and actual arrival time. These latter goals are in conflict with the former ones (at least in some situations) because achieving one set risks failing to achieve the other. The effect of these competing goals on pilots and ground controllers is complex and forces tradeoffs between the goals by pilots and ground controllers.

image

Figure 8.1 Conflicting goals in anesthesia. Some goals, for example acquiring knowledge of underlying medical conditions and avoiding successful lawsuits, create an incentive for preoperative testing. The goal of reduced costs creates an incentive for the use of outpatient surgery and limited preoperative evaluation

Because goal conflicts become easier to see at the sharp end of the system, the tendency is to regard practitioners themselves as the sources of conflict. It should be clear, however, that organizational factors at the blunt end of systems shape the world in which practitioners work. The conflicting or inherently contradictory nature of goals that practitioners confront are usually derived from the organizational factors such as management policies, the need to anticipate legal liability, regulatory guidelines, and economic factors. Competition between goals generated at the organizational level was an important factor in the breakdown of safety barriers in the system for transporting oil through Prince William Sound that preceded the Exxon Valdez disaster (National Transportation Safety Board, 1990). Even when the practitioners seem to be more actively the “cause” of accidents, the conflicted nature of the goals they confront is a critical factor. In cases where the investigations are more thorough and deep, this pattern can be seen more clearly. Aviation accidents, with their high public profile, are examples. There have been several crashes where, in hindsight, crews encountered a complex web of conflicted goals that led to attempted takeoffs with ice on the wings (National Transportation Safety Board, 1993). In one such case, the initial “investigation” of the accident focused narrowly on practitioner error and the second story became apparent only after a national judicial inquiry (Moshansky, 1992).

The goals that drive practitioner behavior are not necessarily those of written policies and procedures. Indeed, the messages received by practitioners about the nature of the institution’s goals may be quite different from those that management acknowledges. Many goals are implicit and unstated. This is especially true of the organizational guidance given to practitioners. For example, the Navy sent an implicit but very clear message to its commanders by the differential treatment it accorded to the commander of the Stark following that incident (U.S. House of Representatives Committee on Armed Services, 1987) as opposed to the Vincennes following that incident (U.S. Department of Defense, 1988; Rochlin, 1991). These covert factors are especially insidious because they shape and constrain behavior and, in politicized and risky settings, because they are difficult to acknowledge. After accidents we can readily obtain the formal policies and written procedures but it is more difficult to capture the web of incentives and imperatives communicated to workers by management in the period that precedes accidents (Stephenson et al., 2000; CAIB, 2003; Woods, 2005).

Coping with Multiple Goals

The dilemmas that inevitably arise out of goal conflicts may be resolved in a variety of ways. In some cases, practitioners may search deliberately for balance between the competing demands. In doing so their strategies may be strong and robust, brittle (work well under some conditions but are vulnerable given other circumstances), or weak (highly vulnerable to breakdown). They may also not deliberate at all but simply apply standard routines. In any case, they either make or accept tradeoffs between competing goals. In the main, practitioners are successful in the effort to strike a balance between the multiple coals present in the domain of practice.

In general, outsiders pay attention to practitioners’ coping strategies only after failure, when such processes seem awkward, flawed, and fallible. It is easy for post-incident evaluations to say that a human error occurred. It is obvious after accidents that practitioners should have delayed or chosen some other means of proceeding that would have avoided (what we now know to be) the complicating factors that precipitated an accident. The role of goal conflicts arising from multiple, simultaneously active goals may never be noted. More likely, the goal conflicts may be discounted as requiring no effort to resolve, where the weighting of goals appears in retrospect to “have been obvious” in ways that require no discussion.

NASA’s “Faster, Better, Cheaper” organizational philosophy in the late 1990s epitomized how multiple, contradictory goals are simultaneously present and active in complex systems (Woods, 2006). The loss of the Mars Climate Orbiter and the Mars Polar Lander in 1999 were ascribed in large part to the irreconcilability of the three goals (faster and better and cheaper), which drove down the cost of launches, made for shorter, aggressive mission schedules, eroded personnel skills and peer interaction, limited time, reduced the workforce, and lowered the level of checks and balances normally found (Stephenson, 2000). People argued that NASA should pick any two from the three goals. Faster and cheaper would not mean better. Better and cheaper would mean slower. Faster and better would be more expensive. Such reduction, however, obscures the actual reality facing operational personnel in safety-critical settings. These people are there to pursue all three goals simultaneously – fine-tuning their operation, as Starbuck and Milliken (1988) said, to “render it less redundant, more efficient, more profitable, cheaper, or more versatile” (p. 323).

CASE 8.2 SPACE SHUTTLE COLUMBIA EXTERNAL TANK MAINTENANCE

The 2003 space shuttle Columbia accident focused attention on the maintenance work that was done on the Shuttle’s external fuel tank, once again revealing the differential pressures of having to be safe and getting the job done (better, but also faster and cheaper). A mechanic working for the contractor, whose task it was to apply the insulating foam to the external fuel tank, testified that it took just a couple of weeks to learn how to get the job done, thereby pleasing upper management and meeting production schedules. An older worker soon showed him how he could mix the base chemicals of the foam in a cup and brush it over scratches and gouges in the insulation, without reporting the repair. The mechanic soon found himself doing this hundreds of times, each time without filling out the required paperwork. Scratches and gouges that were brushed over with the mixture from the cup basically did not exist as far as the organization was concerned. And those that did not exist could not hold up the production schedule for the external fuel tanks. Inspectors often did not check. A company program that once had paid workers hundreds of dollars for finding defects had been watered down, virtually inverted by incentives for getting the job done now.

Goal conflicts between safer, better, and cheaper were reconciled by doing the work more cheaply, superficially better (brushing over gouges), and apparently without cost to safety. As in most operational work, the distance between formal, externally dictated logics of action and actual work is bridged with the help of those who have been there before, who have learned how to get the job done (without apparent safety consequences), and who are proud to share their professional experience with younger, newer workers. Actual practice settles at a distance from the formal description of the job. Informal networks may characterize work, including informal hierarchies of teachers and apprentices and informal documentation of how to actually get work done. The notion of an incident, of something that was worthy of reporting (a defect) becomes blurred against a background of routine nonconformity. What was normal versus what was problem was no longer so clear.

RESPONSIBILITY-AUTHORITY DOUBLE BINDS

A crucial dilemma that plays a role in incidents and accidents involves the relationship between authority and responsibility. Responsibility-authority double binds are situations in which practitioners have the responsibility for the outcome but lack the authority to take the actions they see as necessary. In these situations practitioners’ authority to act is constrained while they remain vulnerable to penalties for bad outcomes. This may arise from vesting authority in external agents via control at a distance (e.g., via the regimentation “just follow the procedures”) or the introduction of machine-cognitive agents that automatically diagnose situations and plan responses.

People working cooperatively (and effectively) tend to pass authority with responsibility together in advisory interactions, that is, in most cases where small groups work together, responsibility and authority stay together (Hoffman et al., 2009). The research results regarding the consequences of dividing responsibility from authority are limited but consistent. Splitting authority and responsibility appears to have poor consequences for the ability of operational systems to handle variability and surprises that go beyond pre-planned routines (Roth et al., 1987; Hirschhorn, 1993). Billings (1991) uses this idea as the fundamental premise of his approach to develop a human-centered automation philosophy – if people are to remain responsible for safe operation, then they must retain effective authority. Automation that supplants rather than assists practitioners violates this fundamental premise.

The paradox here is that attribution of accident cause to human (operator) error often leads to the introduction of organizational change that worsens authority-responsibility double binds. Seeing the operators as the source of failure provides the incentive to defend the system against these acts, usually through regimentation. This effectively leads to a loss of authority. But the complexity of the worlds of practice means that attempts at complete regimentation produce more brittle work systems (but makes it easier to diagnose human error in terms of failures to follow rules).

One consequence of the Three Mile Island nuclear reactor accident was a push by the Nuclear Regulatory Commission for utility companies to develop more detailed and comprehensive work procedures and to ensure that operators followed these procedures exactly. This policy appeared to be a reasonable approach to increase safety. However, for the people at the sharp end of the system who actually did things, strictly following the procedures posed great difficulties. The procedures were inevitably incomplete, and sometimes contradictory (dilemmas about what it meant to follow procedures in complex dynamic abnormal situations arose in a variety of incidents; see Roth et al., 1992, for a study of a simulated emergency where the procedure was incomplete). Then too, novel circumstances arose that were not anticipated in the work procedures. The policy inevitably leads to situations where there is a “double bind” because the people would be wrong if they violated a procedure even though it could turn out to be an inadequate procedure, and they would be wrong if they followed a procedure that turned out to be inadequate.

In some situations, if they followed the standard procedures strictly the job would not be accomplished adequately; if they always waited for formal permission to deviate from standard procedures, throughput, and productivity would be degraded substantially. If they deviated and it later turned out that there was a problem with what they did (e.g., they did not adapt adequately), it could create re-work or safety or economic problems. The double bind arises because the workers are held responsible for the outcome (the poor job, the lost productivity, or the erroneous adaptation); yet they did not have authority for the work practices because they were expected to comply exactly with the written procedures. As Hirschhorn says:

They had much responsibility, indeed as licensed professionals many could be personally fined for errors, but were uncertain of their authority. What freedom of action did they have, what were they responsible for? This gap between responsibility and authority meant that operators and their supervisors felt accountable for events and actions they could neither influence nor control. (Hirschhorn, 1993)

Workers coped with the double bind by developing a “covert work system” that involved, as one worker put it, “doing what the boss wanted, not what he said” (Hirschhorn, 1993). There were channels for requesting changes to problems in the procedures, but the process was cumbersome and time-consuming. This is not surprising since, if modifications are easy and liberally granted, then it may be seen as undermining the policy of strict procedure-following. Notice how the description of this case may fit many different domains (e.g., the evolving nature of medical practice).

The design of computer-based systems has also been shown to be a factor that can create authority-responsibility double binds (Roth et al., 1987). Consider a traditional artificial intelligence based expert system that solves problems on its own, communicating with the operator via a question and answer dialogue. In this approach to assistance, the machine is in control of the problem; the system is built on the premise that the expert system can solve the problem on its own if given the correct data. The human’s role is to serve as the system’s interface to the environment by providing it with the data to solve the problem. If the human practitioners are to do any problem solving, it is carried out in parallel, independent of the interaction with the intelligent system. Results indicate that this prosthesis form of interaction between human and intelligent system is very brittle in the face of complicating factors (Roth et al., 1987). Again, the need to cope with novel situations, adapt to special conditions or contexts, recover from errors in following the instructions, or cope with bugs in the intelligent system itself requires a robust cognitive system that can detect and recover from error.

The crux of the problem in this form of cooperation is that the practitioner has responsibility for the outcome of the diagnosis, but the machine expert has taken over effective authority through control of the problem-solving process. Note the double bind practitioners are left in, even if the machine’s solution is disguised as only “advice” (Roth et al., 1987; Woods, 1991). In hindsight, practitioners would be wrong if they failed to follow the machine’s solution and it turned out to be correct, even though machine can err in some cases. They would be wrong if they followed the machine’s “advice” in those cases where it turned out the machine’s solution was inadequate. They also would be wrong if they were correctly suspicious of the machine’s proposed solution, but failed to handle the situation successfully through their own diagnosis or planning efforts (see Part V on how knowledge of outcome biases evaluation of process). The practitioners in the evolving problem do not have the advantage of knowledge of the eventual outcome when they must evaluate the data at hand including the uncertainties and risks.

Instructions, however elaborate, regardless of medium (paper- or computer-based), and regardless of whether the guidance is completely pre-packaged or partially generated “on-the-fly” by an expert system, are inherently brittle when followed rotely. Brittleness means that it is difficult to build in mechanisms that cope with novel situations, adapt to special conditions or contexts, or recover from errors in following the instructions or bugs in the instructions themselves (e.g., Brown, Moran, and Williams, 1982; Woods et al., 1987; Herry, 1987; Patterson et al., 2010). As Suchman (1987) has put it, “plans are [only] resources for action.”

When people use guidance to solve problems, erroneous actions fall into one of two general categories (Woods et al., 1987; Woods and Shattuck, 2000):

1. rote-rule following persists in the face of changing circumstances that demand adaptation,

2. the people correctly recognize that standard responses are inadequate to meet operational goals given the actual circumstances, but fail to adapt the pre-planned guidance effectively (e.g., missing a side effect).

For example, studies of nuclear power plant operators responding to simulated and to actual accident conditions with paper-based instructions found that operator performance problems fell into one or the other of the above categories (Woods et al., 1987). If practitioners (those who must do something) are held accountable for both kinds of “error” – those where they continue to rotely follow the rules in situations that demand adaptation and those where they erroneously adapt – then the practitioners are trapped in a double bind.

Following instructions requires actively filling in gaps based on an understanding of the goals to be achieved and the structural and functional relationships between objects referred to in the instructions. For example, Smith and Goodman (1984) found that more execution errors arose in assembling an electrical circuit when the instructions consisted exclusively of a linear sequence of steps to be executed, than when explanatory material related the instruction steps to the structure and function of the device. Successful problem solving requires more than rote instruction following; it requires understanding how the various instructions work together to produce intended effects in the evolving problem context.

While some of the problems in instruction following can be eliminated by more carefully worded, detailed, and explicit descriptions of requests, this approach has limitations. Even if, in principle, it were possible to identify all sources of ambiguity and craft detailed wording to avoid them, in practice the resources required for such extensive fine tuning are rarely available. Furthermore, the kinds of literal elaborate statements that would need to be developed to deal with exceptional situations are likely to obstruct the comprehension and execution of instructions in the more typical and straightforward cases (for example, in one aviation incident the crew used about 26 different procedures; see Part IV for more on this incident).

Attempts to eliminate all sources of ambiguity are fundamentally misguided. Examination of language use in human-human communication reveals that language is inherently underspecified, requires the listener (or reader) to fill in gaps based on world knowledge, and to assess and act on the speaker’s (writer’s) intended goals rather than his literal requests (Suchman, 1987). Second, a fundamental competency in human-human communication is the detection and repair of communication breakdowns (Suchman, 1987; Klein, Feltovish, et al., 2005). Again, error recovery is a key process. In part, this occurs because people build up a shared frame of reference about the state of the world and about what are meaningful activities in the current context.

Whenever organizational change or technology change occurs, it is important to recognize that these changes can sharpen or lessen the strategic dilemmas that arise in operations and change how practitioners negotiate tradeoffs in context. In designing high reliability systems for fields of activity with high variability and potential for surprise (Woods and Hollnagel, 2006), one cannot rely just on rotely followed pre-planned routines (even with a tremendous investment in the system for producing and changing the routines). Nor can one rely just on the adaptive intelligence of people (even with a tremendous investment in the people in the system). Distributed cognitive system design should instead focus on how to coordinate pre-planned routines with the demands for adaptation inherent in complex fields of activity (Woods, 1990a). The history of mission control is a good illustration of the coordination of these two types of activity in pace with the varying rhythms of the field of practice (e.g., Murray and Cox, 1989; Watts-Perotti and Woods, 2009).

It is tempting to oversimplify dilemmas as a tradeoff between safety and economy. But dilemmas and goal conflict that confront practitioners are more intimately connected to the details of sharp end practice. During the management of faults and failures, for example, there is a tradeoff with respect to when to commit to a course of action (Woods and Hollnagel, 2006). Practitioners have to decide when to decide. Should they to take corrective action early in the course of an incident with limited information? Should they instead delay their response and wait for more data to come in or ponder additional alternative hypotheses? Act too early, when the diagnosis of the failure is uncertain, and there is a risk of making the situation worse through the wrong action. Act too late and the failure may have progressed to the point where the consequences have increased in scope or changed in kind or even become irremediable.

A dramatic example of these concerns occurred during the Apollo 13 mission. An explosion in the cryogenics systems led to the loss of many critical spacecraft functions and threatened the loss of the spacecraft and crew (see Murray and Cox, 1989).

Lunney [the Flight Director] was persistent because the next step they were contemplating was shutting off the reactant valve in Fuel Cell 1, as they had done already in Fuel Cell 3. If they shut it off and then came up with a … solution that suddenly got the O2 pressures back up, the door would still be closed on two-thirds of the C.S.M’s power supply. It was like shooting a lame horse if you were stranded in the middle of a desert. It might be the smart thing to do, but it was awfully final. Lunney, like Kranz before him, had no way of knowing that the explosion had instantaneously closed the reactant valves on both fuel cells 1 and 3. At ten minutes into his shift, seventy-nine minutes after the explosion, Lunney was close to exhausting the alternatives.

“You’re ready for that now, sure, absolutely, EECOM [the abbreviation for one of the flight controller positions]?”

“That’s it, Flight.”

“It [the oxygen pressure] is still going down and it’s not possible that the thing is sorta bottoming out, is it?”

“Well, the rate is slower, but we have less pressure too, so we would expect it to be a bit slower.”

“You are sure then, you want to close it?”

“Seems to me we have no choice, Flight.”

“Well …”

Burton, under this onslaught, polled his back room one last time. They all agreed.

“We’re go on that, Flight.”

“Okay, that’s your best judgment, we think we ought to close that off, huh?”

“That’s affirmative.”

Lunney finally acquiesced. “Okay. Fuel Cell 1 reactants coming off.”

It was uncharacteristic behavior by Lunney – “stalling,” he would later call it. “Just to be sure. Because it was clear that we were at the ragged edge of being able to get this thing back. … That whole night, I had a sense of containing events as best we could so as not to make a serious mistake and let it get worse.”

The role of both formal rules and rules “of thumb” is centrally concerned with the tradeoffs by practitioners. Practitioners frequently trade off between acting based on operational rules or based on reasoning about the case itself (cf. Woods et al., 1987). The issue, often non-trivial, is whether the standard rules apply to a particular situation. When some additional factor is present that complicates the textbook scenario the practitioner must decide whether to use the standard plan, adapt the standard plan in some way, or abandon the standard plan and formulate a new one (Woods and Shattuck, 2000; Woods and Hollnagel, 2006, Chapter 8; Watts-Perotti and Woods, 2009).

The precise criteria for evaluating these different tradeoffs may not be set by a conscious process or an overt decision made by individuals. It is more likely that they are established as emergent properties of either small groups or larger organizations. The criteria may be fairly labile and susceptible to influence, or they may be relatively stable and difficult to change.

The discussion of rules can be extended to a discussion of the coordination among agents in the distributed cognitive system (Woods and Hollnagel, 2006, Chapter 12). Such agents can be people working together or, increasingly, can be people working with information technology (Roth, Bennett, and Woods, 1987). What if the practitioner’s evaluation is different from that made by a computer agent? When should the machine’s guidance be sufficient? What is enough evidence that the machine is wrong to justify disregarding the machine’s evaluation and proceeding along other lines?

In hindsight, practitioner’s choices or actions can often seem to be simple blunders. Indeed, most of the media reports of human error in aviation, transportation, medicine, are tailored to emphasize the extreme nature of the participants’ behavior. But a more careful assessment of the distributed system may reveal that goals conflicted or other forms of dilemmas arose. Behavior in the specific incident derives from how the practitioners set their tradeoff criteria across different kinds of risks from different kinds of incidents that could occur. Because incidents usually are evaluated as isolated events, such tradeoffs can appear in hindsight to be unwise or even bizarre. This is mitigated when sets of incidents are used as the basis for examining the larger system (see the discussion of hindsight bias in Part V).

When dilemmas are involved in an incident, changing the behavior of the operational system requires a larger analysis of how one should make the tradeoff. It also involves meaningfully and consistently communicating this policy to the operational system so that practitioners adopt it as their criterion. This may implicitly or explicitly involve the commitment of a different system (an organization’s management, an entire industry, a regulatory process). Lanir, Fischhoff, and Johnson, (1988) provide an excellent example through their formal analysis of criteria setting for risk taking within a distributed cognitive system. The danger in missing the role of strategic tradeoffs in producing the observed behavior of operational systems is that the changes made or the messages received by the practitioners can exacerbate dilemmas.

DID THE PRACTITIONERS COMMIT ERRORS?

Given the discussion of cognitive factors (knowledge, mindset, and dilemmas) and of local rationality, let us go back to the three exemplar incidents described earlier in Part III and re-examine them from the perspective of the question: What is human error?

These three incidents are not remarkable or unusual in their own field of activity (urban, tertiary care hospitals) or in other complex domains. In each incident, human performance is closely tied to system performance and to eventual outcome, although the performance of the practitioners is not the sole determinant of outcome. For example, the myocardial infarction following the events of case 6.1. may well have happened irrespective of any actions taken by practitioners. That patient was likely to have an infarction, and it is not possible to say if the anesthetist’s actions caused the infarction. The incidents and the analysis of human performance that they prompt (including the role of latent failures in incidents) may make us change our notion of what constitutes a human error.

Arguably, the performance in each exemplar incident is flawed. In retrospect, things can be identified that might have been done differently and which would have forestalled or minimized the incident or its effect. In the myocardial infarction incident, intravascular volume was misassessed and treatment for several simultaneous problems was poorly coordinated. In the hypotension case (case 7.1.), the device setup by practitioners contributed to the initial fault. The practitioners were also unable to diagnose the fault until well after its effects had cascaded into a near crisis. In the scheduling incident (case 8.1.), a practitioner violated policy. He chose one path to meet certain demands, but simultaneously exposed the larger system to a rare but important variety of failure. In some sense, each of the exemplar incidents constitutes an example of human error. Note, however, that each incident also demonstrates the complexity of the situations confronting practitioners and the way in which practitioners adjust their behavior to adapt to the unusual, difficult, and novel aspects of individual situations.

The hypotension incident (case 7.1.) particularly demonstrates the resilience of human performance in an evolving incident. During this incident the physicians engaged successfully in disturbance management to cope with the consequences of a fault (Woods and Hollnagel, 2006). The physicians were unable to identify the exact source of the incident until after the consequences of the fault had ended. However, they were able to characterize the kind of disturbance present and to respond constructively in the face of time pressure. They successfully treated the consequences of the fault to preserve the patient’s life. They were able to avoid becoming fixated on pursuing what was the “cause” of the trouble. In contrast, another study of anesthesiologist cognitive activities, this time in simulated difficult cases (Schwid and O’Donnell, 1992), found problems in disturbance management where about one-third of the physicians undertreated a significant disturbance in patient physiology (hypotension) while they over-focused on diagnostic search for the source of the disturbance.

The practitioner was also busy during the myocardial infarction incident, although in this instance the focus was primarily on producing better oxygenation of the blood and control of the blood pressure and not on correcting the intravascular volume. These efforts were significant and, in part, successful. In both incidents 1 and 2, attention is drawn to the practitioner performance by the outcome.

In retrospect some would describe aspects of these incidents as human error. The high urine output with blood high glucose and prior administration of furosemide should have prompted the consideration of low (rather than high) intravascular volume. The infusion devices should have been set up correctly, despite the complicated set of steps involved. The diagnosis of hypotension should have included a closer examination of the infusion devices and their associated bags of fluid, despite the extremely poor device feedback. Each of these conclusions, however, depends on knowledge of the outcome; each conclusion suffers from hindsight bias. To say that something should have been obvious, when it manifestly was not, may reveal more about our ignorance of the demands and activities of this complex world than it does about the performance of its practitioners. It is possible to generate lists of “shoulds” for practitioners in large systems but these lists quickly become unwieldy and, in any case, will tend to focus only on the most salient failures from the most recent accident.

The scheduling incident is somewhat different. In that incident it is clear how knowledge of the outcome biases evaluations of the practitioner performance. Is there an error in case 8.1? If a trauma case had occurred in this interval where all the resources had been committed to other cases, would his decision then be considered an error? On the other hand, if he had delayed the start of some other case to be prepared for a possible trauma case that never happened and the delay contributed to some complication for that patient, would his decision then be considered an error?

Uncovering what is behind each of these incidents reveals the label “human error” as a judgment made in hindsight. As these incidents suggest, human performance is as complex and varied as the domain in which it is exercised. Credible evaluations of human performance must be able to account for all of the complexity that confronts practitioners and the strategies they adopt to cope with that complexity. The term human error should not represent the concluding point but rather the starting point for studies of accident evolution in complex systems.

THE N-TUPLE BIND

The Implications of Local Rationality for Studying Error

One implication of local rationality is that normative procedures based on an ideal or perfect rationality do not make sense in evaluating cognitive systems. Rather, we need to find out what are robust, effective strategies given the resources of the problem solvers (i.e., their strategies, the nature of their working memory and attention, long-term memory organization, and retrieval processes, and so on), and the demands of the problem-solving situation (time pressure, conflicting goals, uncertainty, and so on). Error analyses should be based on investigating demand-resource relationships and mismatches (Rasmussen, 1986). As Simon (1969) points out, “It is wrong, in short, in ignoring the principle of bounded rationality, in seeking to erect a theory of human choice on the unrealistic assumptions of virtual omniscience and unlimited computational power” (p. 202).

Human decision makers generally choose strategies that are relatively efficient in terms of effort and accuracy as task and context demands are varied (Payne et al., 1988; 1990). Procedures that seem “normative” for one situation (non-time constrained) may be severely limited in another problem context (time constrained). In developing standards by which to judge what are effective cognitive processes, one must understand problem solving in context, not in “the abstract.” For example, if one were designing a decision aid that incorporated Bayesian inference, one would need to understand the context in which the joint human-machine system functions including such factors as noisy data or time pressure. Fischhoff and Beyth-Marom (1983) point out that applying Bayesian inference in actuality (as opposed to theory) has the following error possibilities: formulation of wrong hypotheses, not correctly eliciting the beliefs and values that need to be incorporated into the decision analysis, estimating or observing prior probabilities and likelihood functions incorrectly, using a wrong aggregation rule or applying the right one incorrectly.

In other words, cognitive strategies represent tradeoffs across a variety of dimensions including accuracy, effort, robustness, risks of different bad outcomes, or the chances for gain from different possible good outcomes. Effective problem-solving strategies are situation specific to some extent; what works well in one case will not necessarily be successful in another. Furthermore, appropriate strategies may change as an incident evolves; for example effective monitoring strategies to detect the initial occurrence of a fault (given normal operations as a background) may be very different from search strategies during a diagnostic phase (Moray, 1984). In understanding these tradeoffs relative to problem demands we can begin to see the idea that expertise and error spring from the same sources.

The assumption of local rationality (Woods and Hollnagel, 2006) – people are doing reasonable things given their knowledge, their objectives, their point of view and limited resources, for example time or workload – points toward a form of error analysis that consists of tracing the problem-solving process to identify points where limited knowledge and limited processing lead to breakdowns. This perspective implies that one must consider what features of domain incidents and situations increase problem demands (Patterson et al., in press).

The incidents described in Part III are exemplars for the different cognitive demands encountered by practitioners who work at the sharp end of large, complex systems, including anesthetists, aircraft pilots, nuclear power plant operators, and others. Each category of cognitive issue (knowledge in context, mindset, strategic factors, and local rationality) plays a role in the conduct of practitioners and hence plays a role in the genesis and evolution of incidents. The division of cognitive issues into these categories provides a tool for analysis of human performance in complex domains. The categories are united, however, in their emphasis on the conflicts present in the domain. The conflicts exist at different levels and have different implications, but the analysis of incidents depends in large part on developing an explicit description of the conflicts and the way in which the practitioners deal with them.

Together the conflicts produce a situation for the practitioner that appears to be a maze of potential pitfalls. This combination of pressures and goals in the work environment is what can be called the n-tuple bind (Cook and Woods, 1994). This term derives from the mathematical concept of a series of numbers required to define an arbitrary point in an n-dimensional space. The metaphor here is one of a collection of factors that occur simultaneously within a large range of dimensions, that is, an extension of the notion of a double bind. The practitioner is confronted with the need to choose a single course of action from myriad possible courses. How to proceed is constrained by both the technical characteristics of the domain and the need to satisfy the “correct” set of goals at a given moment, chosen from the many potentially relevant ones. This is an example of an over-constrained problem, one in which it is impossible to maximize the function or work product on all dimensions simultaneously. Unlike simple laboratory worlds with a best choice, real complex systems intrinsically contain conflicts that must be resolved by the practitioners at the sharp end. Retrospective critiques of the choices made in system operation will always be informed by hindsight. For example, if the choice is between obtaining more information about cardiac function or proceeding directly to surgery with a patient who has soft signs of cardiac disease, the outcome will be a potent determinant of the “correctness” of the decision. Proceeding with undetected cardiac disease may lead to a bad outcome (although this is by no means certain), but obtaining the data may yield normal results, cost money, “waste” time, and incur the ire of the surgeon (Woods, 2006). Possessing knowledge of the outcome trivializes the situation confronting the practitioner and makes the “correct” choice seem crystal clear.

This n-tuple bind is most easily seen in case 8.1 where strategic factors dominate. The practitioner has limited resources and multiple demands for them. There are many sources of uncertainty. How long will the in-vitro fertilization take? It should be a short case but may not be. The exploratory laparotomy may be either simple or complex. With anesthetists of different skill levels, who should be sent to the remote location where that case will take place? Arterial reconstruction patients usually have associated heart disease and the case can be demanding. Should he commit the most senior anesthetist to that case? Such cases are also usually long, and committing the most experienced anesthetist will tie up that resource for a long time. What is the likelihood that a trauma case will come during the time when all the cases will be going simultaneously (about an hour)? There are demands from several surgeons for their case to be the next to start. Which case is the most medically important one? The general rule is that an anesthetist has to be available for a trauma; he is himself an anesthetist and could step in but this would leave no qualified individual to go to cardiac arrests in the hospital or to the emergency room. Is it desirable to commit all the resources now and get all of the pending cases completed so as to free the people for other cases that are likely to follow?

It is not possible to measure accurately the likelihood of the various possible events that he considers. As in many such situations in medicine and elsewhere, he is attempting to strike a balance between common but lower consequence problems and rare but higher consequence ones. Ex post facto observers may view his actions as either positive or negative. On the one hand his actions are decisive and result in rapid completion of the urgent cases. On the other, he has produced a situation where emergent cases may be delayed. The outcome influences how the situation is viewed in retrospect.

A critique often advanced in such situations is that “safety” should outweigh all other factors and be used to differentiate between options. Such a critique is usually made by those very far removed the situations that can arise at the sharp end. Safety is not a concrete entity and the argument that one should always choose the safest path misrepresents the dilemmas that confront the practitioner. The safest anesthetic is the one that is not given; the safest airplane is the one that never leaves the ground. All large, complex systems have intrinsic risks and hazards that must be incurred in order to perform their functions, and all such systems have had failures. The investigation of such failures and the attribution of cause by retrospective reviewers are discussed in Part V of this book.

Another aviation example involves de-icing of the wings of planes before winter weather takeoffs at busy airports. The goal of de-icing is to avoid takeoff with ice on the wings. After an aircraft is de-iced, it enters the queue of planes waiting for takeoff. Because the effectiveness of the de-icing agent degrades with time, delays while in the queue raise the risk of new ice accumulation and provide an incentive to go back to repeat the de-icing. Leaving the queue so that the plane can be de-iced again will cause additional delays; the aircraft will have to re-enter the takeoff queue again. The organization of activities, notably the timing of de-icing and impact of a re-de-icing on location in the queue, can create conflicts. For individual cases, practitioners resolve these conflicts through action, that is, by deciding to return to the de-icing station or remaining in line to takeoff.

Conventional human factors task analyses do not pick up such tradeoffs – task analyses operate at too microscopic a grain of analysis, and how to resolve these conflicts is rarely part of formal job descriptions. The strategic dilemmas may not arise as an explicit conscious decision by an individual so that knowledge acquisition sessions with an expert may not reveal its presence.

To evaluate the behavior of the practitioners involved in an incident, it is important to elucidate the relevant goals, the interactions among these goals, and the factors that influenced criterion setting on how to make tradeoffs in particular situations. The role of these factors is often missed in evaluations of the behavior of practitioners. As a result, it is easy for organizations to produce what appear to be solutions that in fact exacerbate conflict between goals rather than help practitioners handle goal conflicts in context. In part, this occurs because it is difficult for many organizations (particularly in regulated industries) to admit that goal conflicts and tradeoff decisions arise. However distasteful to admit or whatever public relations problems it creates, denying the existence of goal interactions does not make such conflicts disappear and is likely to make them even tougher to handle when they are relevant to a particular situation. As Feynman remarked regarding the Challenger disaster, “For a successful technology, reality must take precedence over public relations, for nature cannot be fooled” (Rogers et al., 1986, Appendix F, p. 5). The difference is that, in human-machine systems, one can sweep the consequences of attempting to fool nature under the rug by labeling the outcome as the consequence of “human error.”

CONCLUSION

When you feel you have to ask, “How could people have missed …? or how could they not have known?” remind yourself to go back and trace knowledge, mindset and goal conflicts as the situation evolved. Try to understand how knowledge was brought to bear in context by people trying to solve an operational problem. Try to see how the cues and indications about their world, and how they evolved and came in over time, influenced what people understood their situation to be at the time and where they reasonably decided they should direct their attention next. Try to grasp how multiple interacting goals, many of them conflicting, some expressed more subtly then others, influenced people’s trade-offs, preferences and priorities. Taken together, this is the recipe for escaping the hindsight bias. This technique, or approach, is laid out in much more detail in The Field Guide to Understanding Human Error (Dekker, 2002).

Also, when your organization considers making changes to the system, the three topics of this section can help you map out the potential cognitive system consequences or reverberations of change. How will changes to your system affect people’s ability to smoothly, fluidly move their attentional resources as operational situations develop around them? How will changes to your system alter the way knowledge is packaged, delivered, transmitted, stored, and organized across the various parts of your system (human and technological)? And how, by extension, does this impact people’s ability to bring knowledge to bear in actual settings where it is needed? Finally, how will changes to your system influence the weighting and prominence of certain goals over others, in turn shifting operational people’s trade-off points or sensitivity to particular strategic directions? These are the systemic reverberations that will influence your system’s ability to create success and forestall failure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.131.168