13
HINDSIGHT BIAS

ATTRIBUTING SYSTEM FAILURES TO PRACTITIONERS

System failures, near failures, and critical incidents are the usual triggers for investigations of human performance. When critical incidents do occur, human error is often seen as a cause of the poor outcome. In fact, large complex systems can be readily identified by the percentage of critical incidents that are considered to have been “caused” by “human error;” the rate for these systems is typically over 70 percent. The repeated finding of about three-quarters of incidents arising from “human error” has built confidence in the notion that there is a human error problem in these domains. Indeed, the belief that fallible humans are responsible for large system failures has led many system designers to use more and more technology to try to eliminate the human operator from the system or to reduce the operator’s possible actions so as to forestall these incidents.

Attributing system failure to the human operators nearest temporally and spatially to the outcome ultimately depends on the judgment by someone that the processes in which the operator engaged were faulty and that these faulty processes led to the bad outcome. Deciding which of the many factors surrounding an incident are important and what level or grain of analysis to apply to those factors is the product of social and psychological processes of causal attribution. What we identify as the cause of an incident depends on what we ourselves have learned previously, where we look, whom we communicate with, on assumed contrast cases or causal background for that exchange, and on the purposes of the inquiry.

For at least four reasons it is not surprising that human operators are blamed for bad outcomes. First, operators are available to blame. Large and intrinsically dangerous systems have a few, well-identified humans at the sharp end. Those humans are closely identified with the system function so that it is unlikely that a bad outcome will occur without having them present. Moreover, these individuals are charged, often formally and institutionally, with ensuring the safe operation as well as the efficient functioning of the system. For any large system failure there will be a human in close temporal and physical relationship to the outcome (e.g., a ship’s captain, pilot, air traffic controller, physician, nurse).

The second reason that “human error” is often the verdict after accidents is that it is so difficult to trace backwards through the causal chain of multiple contributors that are involved in system failure (Rasmussen, 1986). It is particularly difficult to construct a sequence that “passes through” humans in the chain, as opposed to stopping at the sharp-end human(s). To construct such a sequence requires the ability to reconstruct, in detail, the cognitive processing of practitioners during the events that preceded the bad outcome. The environment of the large system makes these sorts of reconstructions extremely difficult. Indeed, a major area of research is development of tools to help investigators trace the cognitive processing of operators as they deal with normal situations, with situations at the edges of normality, and with system faults and failures. The incidents described in Part III are unusual in that substantial detail about what happened, what the participants saw and did, was available to researchers. In general, most traces of causality will begin with the outcome and work backwards in time until they encounter a human whose actions seem to be, in hindsight, inappropriate or sub-optimal. Because so little is known about how human operators actually deal with the multiple conflicting demands of large, complex systems, incident analyses rarely demonstrate the ways in which the actions of the operator made sense at the time.

The third reason that “human error” is often the verdict is paradoxical: “human error” is attributed to be the cause of large system accidents because human performance in these complex systems is so good. Failures of these systems are, by almost any measure, rare and unusual events. Most of the system operations go smoothly; incidents that occur do not usually lead to bad outcomes. These systems have come to be regarded as “safe” by design rather than by control. Those closely studying human operations in these complex systems are usually impressed by the fact that the opportunity for large-scale system failures is present all the time and that expert human performance is able to prevent these failures. As the performance of human operators improves and failure rates fall, there is a tendency to regard system performance as a marked improvement in some underlying quality of the system itself, rather than the honing of skills and expertise within the distributed operational system to fine edge. The studies of aircraft carrier flight operations by Rochlin et al., (1987) point out that the qualities of human operators are crucial to maintaining system performance goals and that, by most measures, failures should be occurring much more often than they do. As consumers of the products from large complex systems such as health care, transportation, and defense, society is lulled by success into the belief that these systems are intrinsically low-risk and that the expected failure rate should be zero. Only catastrophic failures receive public attention and scrutiny. The remainder of the system operation is generally regarded as unflawed because of the low overt failure rate, even though there are many incidents that could become overt failures. Thorough accident analyses often indicate that there were precursor events or “dress rehearsals” that preceded an accident.

This ability to trace backwards with the advantage of hindsight is the fourth major reason that human error is so often the verdict after accidents. Studies have consistently shown that people have a tendency to judge the quality of a process by its outcome. Information about outcome biases their evaluation of the process that was followed. Also, people have a tendency to “consistently exaggerate what could have been anticipated in foresight” (Fischhoff, 1975). Typically, hindsight bias in evaluations makes it seem that participants failed to account for information or conditions that “should have been obvious”(when someone claims that something “should have been obvious” hindsight bias is virtually always present) or behaved in ways that were inconsistent with the (now known to be) significant information. Thus, knowledge of a poor outcome biases the reviewer towards attributing failures to system operators. But to decide what would be “obvious” to practitioners in the unfolding problem requires investigating many factors about the evolving incident, the operational system and its organizational context such as the background of normal occurrences, routine practices, knowledge factors, attentional demands, strategic dilemmas, and other factors.

The psychological and social processes involved in judging whether or not a human error occurred is critically dependent on knowledge of the outcome, something that is impossible before the fact. Indeed, it is clear from the studies of large system failures that hindsight bias is the greatest obstacle to evaluating the performance of humans in complex systems.

THE BIASING EFFECT OF OUTCOME KNOWLEDGE

Outcome knowledge influences our assessments and judgments of past events. These hindsight or outcome biases have strong implications for how we study and evaluate accidents, incidents, and human performance.

Whenever one discusses “human error,” one should distinguish between outcome failures and defects in the problem-solving process. Outcome failures are defined in terms of a categorical shift in consequences on some performance dimension. Generally, these consequences are directly observable. Outcome failures necessarily are defined in terms of the language of the domain, for example for anesthesiology sequelae such as neurological deficit, reintubation, myocardial infarction within 48 hours, or unplanned ICU admission. Military aviation examples of outcome failures include an unfulfilled mission goal, a failure to prevent or mitigate the consequences of some system failure on the aircraft, or a failure to survive the mission. An outcome failure provides the impetus for an accident investigation.

Process defects, on the other hand, are departures from some standard about how problems should be solved. Generally, the process defect, if uncorrected, would lead to, or increase the risk of, some type of outcome failure. Process defects can be defined in domain terms. For example in anesthesiology, some process defects may include insufficient intravenous access, insufficient monitoring, regional versus general anesthetic, and decisions about canceling a case. They may also be defined psychologically in terms of deficiencies in some cognitive function: for example activation of knowledge in context, mode errors, situation awareness, diagnostic search, and goal tradeoffs.

People have a tendency to judge a process by its outcome. In the typical study, two groups are asked to evaluate human performance in cases with the same descriptive facts but with the outcomes randomly assigned to be either bad or neutral. Those with knowledge of a poor outcome judge the same decision or action more severely. This is referred to as the outcome bias (Baron and Hershey, 1988) and has been demonstrated with practitioners in different domains. For example, Caplan, Posner, and Cheney (1991) found an inverse relationship between the severity of outcome and anesthesiologists’ judgments of the appropriateness of care. The judges consistently rated the care in cases with bad outcomes as substandard while viewing the same behaviors with neutral outcomes as being up to standard even though the care (that is, the preceding human acts) were identical. Similarly, Lipshitz (1989) found the outcome bias when middle rank officers evaluated the decisions made by a hypothetical officer. Lipshitz (1989) points out that “judgment by outcomes is a fact of life for decision makers in politics and organizations.” In other words, the label “error” tends to be associated with negative outcomes.

It may seem reasonable to assume that a bad outcome stemmed from a bad decision, but information about the outcome is actually irrelevant to the judgment of the quality of the process that led to that outcome (Baron and Hershey, 1988). The people in the problem do not intend to produce a bad outcome (Rasmussen et al., 1987). Practitioners at the sharp end are responsible for action when the outcome is in doubt and consequences associated with poor outcomes are highly negative. If they, like their evaluators, possessed the knowledge that their process would lead to a bad outcome, then they would use this information to modify how they handled the problem. Ultimately, the distinction between the evaluation of a decision process and evaluation of an outcome is important to maintain because good decision processes can lead to bad outcomes and good outcomes may still occur despite poor decisions.

Other research has shown that once people have knowledge of an outcome, they tend to view the outcome as having been more probable than other possible outcomes. Moreover, people tend to be largely unaware of the modifying effect of outcome information on what they believe they could have known in foresight. These two tendencies collectively have been termed the hindsight bias. Fischhoff (1975) originally demonstrated the hindsight bias in a set of experiments that compared foresight and hindsight judgments concerning the likelihood of particular socio-historical events. Basically, the bias has been demonstrated in the following way: participants are told about some event, and some are provided with outcome information. At least two different outcomes are used in order to control for one particular outcome being a priori more likely. Participants are then asked to estimate the probabilities associated with the several possible outcomes. Participants given the outcome information are told to ignore it in coming up with their estimates, that is, “to respond as if they had not known the actual outcome,” or in some cases are told to respond as they think others without outcome knowledge would respond. Those participants with the outcome knowledge judge the outcomes they had knowledge about as more likely than the participants without the outcome knowledge.

The hindsight bias has proven to be robust; it has been demonstrated for different types of knowledge: episodes, world facts (e.g., Wood, 1978; Fischhoff, 1977), and in some real-world settings. For example, several researchers have found that medical practitioners exhibited a hindsight bias when rating the likelihood of various diagnoses (cf., Fraser, Smith, and Smith, 1992).

Experiments on the hindsight bias have shown that: (a) people overestimate what they would have known in foresight, (b) they also overestimate what others knew in foresight (Fischhoff, 1975), and (c) they actually misremember what they themselves knew in foresight (Fischhoff and Beyth, 1975). This misremembering may be linked to the work on reconstructive memory, in which a person’s memories can be changed by subsequent information, for example, leading questions may change eyewitnesses’ memories (Loftus, 1979).

Fischhoff (1975) postulated that outcome knowledge is immediately assimilated with what is already known about the event. A process of retrospective sense-making may be at work in which the whole event, including outcome, is constructed into a coherent whole. This process could result in information that is consistent with the outcome being given more weight than information inconsistent with it.

It appears that when we receive outcome knowledge, we immediately make sense out of it by integrating it into what we already know about the subject. Having made this reinterpretation, the reported outcome now seems a more or less inevitable outgrowth of the reinterpreted situation. “Making sense” out of what we are told about the past is, in turn, so natural that we may be unaware that outcome knowledge has had any effect on us. … In trying to reconstruct our foresightful state of mind, we will remain anchored in our hindsightful perspective, leaving the reported outcome too likely looking. (Fischhoff, 1982, p. 343)

It may be that retrospective outsiders (people who observe and judge practitioners’ performance from the outside and from hindsight) rewrite the story so that the information is causally connected to the outcome. A study by Wasserman, Lempert, and Hastie (1991) supports this idea. They found that people exhibit more of a hindsight bias when they are given a causal explanation for the outcome than when the outcome provided is due to a chance event (but see Hasher, Attig, and Alba, 1981, for an alternative explanation; see Hawkins and Hastie, 1990, for a summary).

Taken together, the outcome and hindsight biases have strong implications for error analyses.

image Decisions and actions having a negative outcome will be judged more harshly than if the same process had resulted in a neutral or positive outcome. We can expect this result even when judges are warned about the phenomenon and have been advised to guard against it (Fischhoff, 1975, 1982).

image Retrospectively, outsiders will tend to believe that people involved in some incident knew more about their situation than they actually did. Judges will tend to think that people should have seen how their actions would lead up to the outcome failure. Typical questions a person exhibiting the hindsight bias might ask are: “Why didn’t they see what was going to happen? It was so obvious!” Or, “How could they have done X? It was clear it would lead to Y!”

Hence it is easy for observers after the fact to miss or underemphasize the role of cognitive, design, and organizational factors in incident evolution. For example, a mode error was probably an important contributor to the Strasbourg crash of an Airbus A-320. As we have seen, this error form is a human-machine system breakdown that is tied to design problems. Yet people rationalize that mode error does not imply the need for design modifications:

While you can incorporate all the human engineering you want in an aircraft, it’s not going to work if the human does not want to read what is presented to him, and verify that he hasn’t made an error. (Remarks by Y. Benoist, Director of Flight Safety, Airbus Industry, 1992)

Similarly, in the aftermath of the AT&T’s Thomas Street telecommunication outage in 1991, it was easy to focus on individuals at the sharp end and ignore the larger organizational factors.

It’s terrible the incident in New York was (pause) all avoidable. The alarms were disarmed; no one paid attention to the alarms that weren’t disarmed; that doesn’t have anything to do with technology, that doesn’t have anything to do with competition, it has to do with common sense and attention to detail.” (Remarks by Richard Liebhaber of MCI commenting on AT&T’s Thomas Street outage occurred on September 7, 1991; from MacNeil-Lehrer Report, PBS)

In this case, as in others, hindsight biases the judgment of the commentator. A detailed examination of the events leading up to the Thomas Street outage clearly shows how the alarm issue is, in part, a red herring and clearly implicates failures in the organization and management of the facility (see FCC, 1991).

In effect, judges will tend to simplify the problem-solving situation that was actually faced by the practitioner. The dilemmas facing the practitioner in situ, the uncertainties, the tradeoffs, the attentional demands, and the double binds, all may be under-emphasized when an incident is viewed in hindsight. A consideration of practitioners’ resources and the contextual and task demands that impinge on them is crucial for understanding the process involved in the incident and for uncovering process defects.

In summary, these biases play a role in how practitioners’ actions and decisions are judged after the fact. The biases illustrate that attributing human error or other causes (e.g., software error) for outcomes is a psychological and social process of judgment. These biases can lead us to summarize the complex interplay of multiple contributors with simple labels such as “lack of attention” or “willful disregard.” They can make us miss the underlying factors which could be changed to improve the system for the future, for example lack of knowledge or double binds induced by competing goals. Furthermore, the biases illustrate that the situation of an evaluator after-the-fact who does not face uncertainty, risk, and who possesses knowledge of outcome is fundamentally different from that of a practitioner in an evolving problem.

So whenever you hear someone say (or feel yourself tempted to say) something like: “Why didn’t they see what was going to happen? It was so obvious!” or “How could they have done X? It was clear it would lead to Y!” Remember that error is the starting point of an investigation; remember that the error investigator builds a model of how the participants behaved in a locally rational way given the knowledge, attentional demands, and strategic factors at work in that particular field of activity. This is the case regardless of whether one is attributing error to operators, designers, or managers. In other words, it is the responsibility of the error investigator to explore how it could have been hard to see what was going to happen or hard to project the consequences of an action. This does not mean that some assessments or actions are not clearly erroneous. But adoption of the local rationality perspective is important to finding out how and why the erroneous action could have occurred and, therefore, is essential for developing effective countermeasures rather than the usual window dressing of “blame and train,” “a little more technology will be enough,” or “only follow the rules” recommendations.

Some research has addressed ways to “debias” judges. Simply telling people to ignore outcome information is not effective (Fischhoff, 1975). In addition, telling people about the hindsight bias and to be on guard for it does not seem to be effective (Fischhoff, 1977; Wood, 1978). Strongly discrediting the outcome information can be effective (Hawkins and Hastie, 1990), although this may be impractical for conducting accident analyses.

The method that seems to have had the most success is for judges to consider alternatives to the actual outcome. For example, the hindsight bias may be reduced by asking subjects to explain how each of the possible outcomes might have occurred (Hoch and Lowenstein, 1989). Another relatively successful variant of this method is to ask people to list reasons both for and against each of the possible outcomes (von Winterfeldt and Edwards, 1986; Fraser et al, 1992). This technique is in the vein of a Devil’s Advocate approach, which may be one way to guard against a variety of breakdowns in cognitive systems (Schwenk and Cosier, 1980).

This is an example of the general problem solving strategy of considering alternatives to avoid premature closure (Patterson et al., 2001; Zelik et al., 2010).

This work has implications for debiasing judges in accident analysis. But first we need to ask the basic question: What standard of comparison should we use to judge processes (decisions and actions) rather than outcomes?

STANDARDS FOR ASSESSING PROCESSES RATHER THAN OUTCOMES

We have tried to make clear that one of the recurring problems in studying error is a confusion over whether the label is being used to indicate that an outcome failure occurred or that the process used is somehow deficient. The previous section showed that outcome knowledge biases judgments about the processes that led to that outcome. But it seems common sense that some processes are better than others for maximizing the chances of achieving good outcomes regardless of the presence of irreducible uncertainties and risks. And it seems self-evident that some processes are deficient with respect to achieving good outcomes – e.g., relevant evidence may not be considered, meaningful options may not be entertained, contingencies may not have been thought through. But how do we evaluate processes without employing outcome information? How do we know that a contingency should have been thought through except through experience? This is especially difficult given the infinite variety of the real world, and the fact that all systems are resource-constrained. Not all possible evidence, all possible hypotheses, or all possible contingencies can be entertained by limited resource systems. So the question is: what standards can be used to determine when a process is deficient?

There is a loose coupling between process and outcome – not all process defects are associated with bad outcomes, and good process cannot guarantee success given irreducible uncertainties, time pressure, and limited resources. But poor outcomes are relatively easy to spot and to aggregate in terms of the goals of that field of activity (e.g., lives lost, radiation exposure, hull losses, reduced throughput, costs, lost hours due to injuries). Reducing bad outcomes generally is seen as the ultimate criterion for assessing the effectiveness of changes to a complex system. However, measuring the reliability of a complex, highly-coupled system in terms of outcomes has serious limitations. One has to wait for bad outcomes (thus one has to experience the consequences). Bad outcomes may be rare (which is fortunate, but it also means that epidemiological approaches will be inappropriate). It is easy to focus on the unique and local aspects of each bad outcome obscuring larger trends or risks. Bad outcomes involve very many features, factors, and facets: Which were critical? Which should be changed?

If we try to measure the processes that lead to outcomes, we need to define some standard about how to achieve or how to maximize the chances for successful outcomes given the risks, uncertainties, tradeoffs, and resource limitations present in that field of activity. The rate of process defects may be much more frequent than the incidence of overt system failures. This is so because the redundant nature of complex systems protects against many defects. It is also because the systems employ human operators whose function is, in part, to detect such process flaws and adjust for them before they produce bad outcomes.

Process defects can be specified locally in terms of the specific field of activity (e.g., these two switches are confusable). But they also can be abstracted relative to models of error and system breakdown (this erroneous action or system failure is an instance of a larger pattern or syndrome – mode error, latent failures, and so on). This allows one to use individual cases of erroneous actions or system breakdown, not as mere anecdotes or case studies, but rather as individual observations that can be compared, contrasted, and combined to look for, explore, or test larger concepts. It also allows for transfer from one specific setting to another to escape the overwhelming particularity of cases.

STANDARDS FOR EVALUATING GOOD PROCESS

But specifying a process as defective in some way requires an act of judgment about the likelihood of particular processes leading to successful outcomes given different features of the field of activity. What dimensions of performance should guide the evaluation, for example efficiency or robustness; safety or throughput? This loose coupling between process and outcome leaves us with a continuing nagging problem. Defining human error as a form of process defect implies that there exists some criterion or standard against which the activities of the agents in the system have been measured and deemed inadequate. However, what standard should be used to mark a process as deficient? And depending on the standard a reviewer adopts, very different views of error result.

We do not think that there can be a single and simple answer to this question. Given this, we must be very clear about what standards are being used to define “error” in particular studies or incidents; otherwise, we greatly retard our ability to engage in a constructive and empirically grounded debate about error. All claims about when an action or assessment is erroneous in a process sense should be accompanied with an explicit statement of the standard used for defining departures from good process.

One kind of standard about how problems should be handled is a normative model of task performance. This method requires detailed knowledge about precisely how problems should be solved, that is, nearly complete and exhaustive knowledge of the way in which the system works. Such knowledge is, in practice, rare. At best, some few components of the larger system can be characterized in this exhaustive way. As a result, normative models rarely exist for complex fields of activity where bad outcomes have large consequences. There are great questions surrounding how to transfer normative models developed for much simpler situations to these more complex fields of activity (Klein et al., 1993). For example, laboratory-based normative models may ignore the role of time or may assume resource unlimited cognitive processing.

Another standard is the comparison of actual behavior to standard operating procedures or other norms deemed relevant to a profession (e.g., standards of care, policies). These practices are mostly compilations of rules and procedures that are acceptable behaviors for a variety of situations. They include various protocols (e.g., the Advanced Cardiac Life Support protocol for cardiac arrest), policies (e.g., it is the policy of the hospital to have informed consent from all patients prior to beginning an anesthetic), and procedures (e.g., the chief resident calls the attending anesthesiologist to the room before beginning the anesthetic, but after all necessary preparations have been made).

Using standard procedures as a criterion may be of limited value because they are codified in ways that ignore the real nature of the domain. It is not unusual, for example, to have a large body of rules and procedures that are not followed because to do so would make the system intolerably inefficient. The “work to rule” method used by unions to produce an unacceptable slowdown of operations is an example of the way in which reference to standards is unrealistic. In this technique, the workers perform their tasks to an exact standard of the existing rules, and the system performance is so degraded by the extra steps required to conform to all the rules that it becomes non-functional (e.g., see Hirschhorn, 1993).

Standard procedures are severely limited as a criterion because procedures are underspecified and therefore too vague to use for evaluation. For example, one senior anesthesiologist replied, when asked about the policy of the institution regarding the care for emergent caesarean sections, “our policy is to do the right thing.” This seemingly curious phrase in fact sums up the problem confronting those at the sharp end of large, complex systems. It recognizes that it is impossible to comprehensively list all possible situations and appropriate responses because the world is too complex and fluid. Thus the person in the situation is required to account for the many factors that are unique to that situation. What sounds like a nonsense phrase is, in fact, an expression of the limitations that apply to all structures of rules, regulations and policies (cf. for example, Suchman, 1987; Roth et al., 1987; Woods and Hollnagel, 2006).

One part of this is that standard procedures underspecify many of the activities and the concomitant knowledge and cognitive factors required to go from a formal statement of a plan to a series of temporally structured activities in the physical world (e.g., Roth et al., 1987; Suchman, 1987). As Suchman puts it, plans are resources for action – an abstraction or representation of physical activity; they cannot, for both theoretical and practical reasons, completely specify all activity.

In general, procedural rules are underspecified and too vague to be used for evaluation if one cannot to determine the adequacy of performance before the fact. Thus, procedural rules such as “the anesthetic shall not begin until the patient has been properly prepared for surgery” or “stop all unnecessary pumps” are underspecified. The practitioner on the scene must use contextual information to define when this patient is “properly prepared” or what pumps are “unnecessary” at this stage of a particular nuclear power-plant incident. Ultimately, it is the role of the human at the sharp end to resolve incompleteness, apparent contradictions, and conflicts in order to satisfy the goals of the system.

A second reason for the gap between formal descriptions of work and the actual work practices is that the formal descriptions underestimate the dilemmas, interactions between constraints, goal conflicts, and tradeoffs present in the actual workplace (e.g., Cook et al., 1991a; Hirschhorn, 1993). In these cases, following the rules may, in fact, require complex judgments as illustrated in the section on double binds. Using standard procedures as a criterion for error may hide the larger dilemma created by organizational factors while providing the administrative hierarchy the opportunity to assign blame to operators after accidents (e.g., see Lauber, 1993 and the report on the aircraft accident at Dryden, Ontario; Moshansky, 1992).

Third, formal descriptions tend to focus on only one agent or one role within the distributed cognitive system. The operator’s tasks in a nuclear power plant are described in terms of the assessments and actions prescribed in the written procedures for handling emergencies. But this focuses attention only on how the board operators (those who manipulate the controls) act during “textbook” incidents. Woods has shown through several converging studies of actual and simulated operator decision-making in emergencies that the operational system for handling emergencies involves many decisions, dilemmas, and other cognitive tasks that are not explicitly represented in the procedures (see Woods et al., 1987, for a summary). Emergency operations involve many people in different roles in different facilities beyond the control room. For example, operators confront decisions about whether the formal plans are indeed relevant to the actual situation they are facing, and decisions about bringing additional knowledge sources to bear on a problem.

All these factors are wonderfully illustrated by almost any cognitive analysis of a real incident that goes beyond textbook cases. One of these is captured by a study of one type of incident in nuclear power plants (see Roth et al., 1992). In this case, in hindsight, there is a procedure that identifies the kind of problem and specifies the responses to this particular class of faults. However, handling the incident is actually quite difficult. First, as the situation unfolds in time, the symptoms are similar to another kind of problem with its associated procedures (i.e., the incident has a garden path quality; there is a plausible but erroneous initial assessment). The relationship between what is seen, the practitioners’ expectations, and other possible trajectories is critical to understanding the cognitive demands, tasks, and activities in that situation. Second, the timing of events and the dynamic inter-relationships among various processes contain key information for assessing the situation. This temporally contingent data is not well represented within a static plan even if its significance is recognized by the procedure writers. Ultimately, to handle this incident, the operators must step outside the closed world defined by the procedure system.

Standard practices and operating procedures may also miss the fact that for realistically complex problems there is often no one best method. Rather, there is an envelope containing multiple paths, each of which can lead to a satisfactory outcome (Rouse et al., 1984; Woods et al., 1987). Consider the example of an incident scenario used in a simulation study of cognition on the flight deck in commercial aviation (Sarter and Woods, 1993; note that the simulated scenario was based, in part, on an actual incident). To pose a diagnostic problem with certain characteristics (e.g., the need to integrate diverse data, the need to recall and re-interpret past data in light of new developments, and so on), the investigators set up a series of events that would lead to the loss of one engine and two hydraulic systems (a combination that requires the crew to land the aircraft as soon as possible). A fuel tank is underfuelled at the departure airport, but the crew does not realize this, as the fuel gauge for that tank has been declared inoperative by maintenance. For aircraft at that time, there were standardsfor fuel management, that is, how to feed fuel from the different fuel tanks to the engines. The investigators expected the crews to follow the standard procedures, which in this context would lead to the engine loss, the loss of one of the hydraulic systems, and the associated cognitive demands. And this is indeed what happened except for one crew. This one flight engineer, upon learning that one of his fuel tank gauges would be inoperative throughout the flight, decided to use a non-standard fuel management configuration to ensure that, just in case of any other troubles, he would not lose an engine or risk a hydraulic overheat. In other words, he anticipated some of the potential interactions between the lost indication and other kinds of problems that could arise and then shifted from the standard fuel management practices. Through this non-standard behavior, he prevented all of the later problems that the investigators had set up for the crews in the study.

Did this crew member commit an error? If one’s criterion is departure from standard practices, then his behavior was “erroneous.” If one focuses on the loss of indication, the pilot’s adaptation anticipated troubles that might occur and that might be more difficult to recognize given the missing indication. By this criterion, it is a successful adaptation. But what if the pilot had mishandled the non-standard fuel management approach (a possibility since it would be less practiced, less familiar)? What if he had not thought through all of the side effects of the non-standard approach – did the change make him more vulnerable to other kinds of troubles?

Consider another case, this one an actual aviation incident from 1991 (we condensed the following from the Aviation Safety Reporting System’s incident report to reduce aviation jargon and to shorten and simplify the sequence of events):

CASE 13.1 CASCADING AUTOMATED WARNINGS

Climbout was normal, following a night heavy weight departure under poor weather conditions, until approximately 24,000 ft when numerous caution/warning messages began to appear on the cockpit’s electronic caution and warning system (CRT-based information displays and alarms about the aircraft’s mechanical, electric, and engine systems). The first of these warning messages was OVHT ENG 1 NAC, closely followed by BLEED DUCT LEAK L ENG 1 OIL PRESSURE, FLAPS PRIMARY, FMC L, STARTER CUTOUT 1, and others. Additionally, the #1 engine generator tripped off the line (generating various messages), and the #1 engine amber “REV” indication appeared (indicating a #1 engine reverse). In general, the messages indicated a deteriorating mechanical condition of the aircraft. At approximately 26,000 ft, the captain initiated an emergency descent and turnback to the departing airport.

The crew, supported by two augmented crew pilots (i.e., a total of four pilots), began to perform numerous (over 20) emergency checklists (related to the various warnings messages, the need to dump fuel, the need to follow alternate descent procedures and many others). In fact, the aircraft had experienced a serious pylon/wing fire. Significantly, there was no indication of fire in the cockpit information systems, and the crew did not realize that the aircraft was on fire until informed of this by ATC during the landing roll out. The crew received and had to sort out 54 warning messages on the electronic displays, repeated stick shaker activation, and abnormal speed reference data on the primary flight display. Many of these indications were conflicting, leading the crew to suspect number one engine problems when that engine was actually functioning normally. Superior airmanship and timely use of all available resources enabled this crew to land the aircraft and safely evacuate all passengers and crew from the burning aircraft.

The crew successfully handled the incident – the aircraft landed safely. Therefore, one might say that no errors occurred. On the other hand, the crew did not correctly assess the source of the problems, they did not realize that there was a fire until after touchdown, and they suspected number one engine problems when that engine was actually functioning normally. Should these be counted as erroneous assessments? Recall, though, that the display and warning systems presented “an electronic system nightmare” as the crew had to try to sort out an avalanche of low-level and conflicting indications in a very high-workload and highly critical situation. The incident occurred on a flight with two extra pilots aboard (the nominal crew is two). They had to manage many tasks in order to make an emergency descent in very poor weather and with an aircraft in deteriorating mechanical condition. Note the large number of procedures which had to be coordinated and executed correctly. How did the extra crew contribute to the outcome? Would a standard-sized crew have handled the incident as well? These would be interesting questions to pursue using the neutral-practitioner criteria (see the next section).

The above incidents help to exemplify several points. Assessing good or bad process is extremely complex; there are no simple answers or criteria. Standard practices and procedures provide very limited and very weak criteria for defining errors as bad process. What can one do then? It would be easy to point to other examples of cases where commentators would generally agree that the cognitive process involved was deficient on some score. One implication is to try to develop other methods for studying cognitive processes that provide better insights about why systems fail and how they may be changed to produce higher reliability human-machine systems (Rochlin et al., 1987).

NEUTRAL-PRACTITIONER CRITERIA

The practitioners at the “sharp end” are embedded in an evolving context. They experience the consequences of their actions directly or indirectly. They must act under irreducible uncertainty and the ever-present possibility that in hindsight their responses may turn out wrong. As one critical care physician put it when explaining his field of medicine: “We’re the ones who have to do something.” It is their job to interpret situations that cannot be completely specified in detail ahead of time. Indeed, it is part of practitioners’ tacit job description to negotiate the tradeoffs of the moment.

Blessed with the luxury of hindsight, it is easy to lose the perspective of someone embedded in an evolving situation who experiences the full set of interacting constraints that they must act under. But this is the perspective that we must capture if we are to understand how an incident evolved towards disaster. One technique for understanding the situated practitioner represents a third approach to develop a standard of comparison. One could use an empirical approach, one that asks: “What would other similar practitioners have thought or done in this situation?” De Keyser and Woods (1990) called this kind of empirically based comparison the neutral-practitioner criterion. To develop a neutral-practitioner criterion, one collects data to compare practitioner behavior during the incident in question with the behavior of similar practitioners at various points in the evolving incident and in similar or contrasting cases. In practice, the comparison is usually accomplished by using the judgment of similar practitioners about how they would behave under similar circumstances. Neutral-practitioners make judgments or interpretations about the state of the world, relevant possible future event sequences, and relevant courses of action. The question is whether the path taken by the actual problem-solver is one that is plausible to the neutral-practitioners. One key is to avoid contamination by the hindsight bias; knowledge about the later outcome may alter the neutral-practitioners’ judgment about the propriety of earlier responses. One function of neutral-practitioners is to help define the envelope of appropriate responses given the information available to the practitioner at each point in the incident. Another function is to capture the real dilemmas, goal conflicts, and tradeoffs present in the actual workplace. In other words, the purpose is capture the ways that formal policies and procedures underspecify the demands of the field of practice.

An example occurred in regard to the Strasbourg aircraft crash (Monnier, 1992). Mode error in pilot interaction with cockpit automation seems to have been a contributor to this accident. Following the accident several people in the aviation industry noted a variety of precursor incidents for the crash where similar mode errors had occurred, although the incidents did not evolve as far towards negative consequences. These data provide us with information about what other similar practitioners have done, or would have done, when embedded in the context of commercial air transport. It indicates that a systemic vulnerability existed based on the design, rather than a simple case of a “human error.”

Our research, and that of others, is based on the development of neutral-practitioner criteria for actions in complex systems. This method involves comparing actions that were taken by individuals to those of other similar practitioners placed in a similar situation. Note that this is a strong criterion for comparison and it requires that the evaluators possess or gather the same sort of expertise and experience as was employed during the incident. It does not rely on comparing practitioner behaviors with theory, rules, or policies. It is particularly effective for situations where the real demands of the system are poorly understood and where the pace of system activity level is fast or can cascade (i.e., in large, complex systems).

ERROR ANALYSIS AS CAUSAL JUDGMENT

Error and accident analysis is one case where people – lay people, scientists, engineers, managers, or regulators – make causal judgments or attributions. Causal attribution is a psychological and social judgment process that involves isolating one factor from among many contributing factors as a “cause” for the event to be explained. Strictly speaking, there are almost always several necessary and sufficient conditions for an event. But people distinguish among these necessary and sufficient conditions focusing on some as causes and relegating others to a background status as enabling conditions. In part, what is perceived as cause or enabling condition will depend on the context or causal background adopted (see Hart and Honore, 1959; also see Cheng and Novick, 1992). Consider a classic example used to illustrate this point. Oxygen is typically considered an enabling condition in an accident involving fire, as in the case of a dropped cigarette. However, people would generally consider oxygen as a cause if a fire broke out in a laboratory where oxygen was deliberately excluded as part of an experiment.

Current models of causal attribution processes hold that people attempt to explain the difference between the event in question and some contrasting case (or set of cases). Rather than explaining an event per se, one explains why the event occurs in the target case and not in some counterfactual contrast case (Hilton, 1990). Some relevant factors for establishing a causal background or contrast case are the dimensions originally proposed by Kelley (1973): “consensus, distinctiveness, and consistency”. Consensus refers to the agreement between the responses of other people and the response of a particular person regarding a particular stimulus on a particular occasion; distinctiveness refers to the disagreement between the particular person’s responses to some particular stimulus and other stimuli on the particular occasion; and consistency refers to the agreement between the way a particular person responds to a particular stimulus on different occasions (Cheng and Novick, 1992). The critical point is that there are degrees of freedom in how an event, such as an accident, is explained, and the explanation chosen depends, in part on the contrasting case or cases adopted. Thus, in a neutral-practitioner approach, the investigator tries to obtain data on different kinds of contrast cases, each of which may throw into relief different aspects of the dynamics of the incident in question.

Note that interactional or contrast case models of causal attribution help us to understand the diversity of approaches and attitudes towards “human error” and disasters. If someone asks another person why a particular incident occurred and if the shared background between these people is that “causes” of accidents are generally major equipment failures, environmental stresses, or misoperation, then it becomes sensible to respond that the incident was due to human error. If one asks why did a particular incident occur, when the shared background concerns identifying who is financially responsible (e.g., a legal perspective), then it becomes sensible to expect an answer that specifies the person or organization that erred. If questioner and respondent only appear to have a shared background (because both use the words “human error”) when they, in fact, have different frames of reference for the question, then it is not surprising to find confusion.

In some sense, one could see the research of the 1980s on error as framing a different background for the question: Why did this incident occur? The causal background for the researchers involved in this intensive and cross-disciplinary examination of error and disaster was: How do we develop higher reliability complex human-machine systems? This causal background helped to point these researchers towards system-level factors in the management and design of the complex processes. In addition, when this question is posed by social and behavioral scientists, they (not so surprisingly) find socio-technical contributors (as opposed to reliability engineers who pointed to a different set of factors; Hollnagel, 1993). The benefit of the socio-technical background as a frame of reference for causal attribution is that it heightens our ability to go beyond the attribution of “human error” in analysis of risk and in measures to enhance safety. It seems to us that psychological processes of causal attribution apply as well to researchers on human error as they do to non-behavioral scientists. One could imagine a corollary for accident investigators to William James’ Psychologist’s Fallacy in which psychologists suppose that they are immune from the psychological processes that they study (Woods and Hollnagel, 2006).

The background for a neutral-practitioner approach to analyzing cognitive process and error comes from the local rationality assumption; that is, people do reasonable things, given their knowledge, objectives, point of view, and limited resources. However, an accident is by definition unintentional – people do not intend to act in ways that produce negative consequences (excepting sabotage). Error analysis traces the problem-solving process to identify points at which limited knowledge and processing lead to breakdowns. Process-tracing methods are used to map out how the incident unfolded over time, what the available cues were, which cues were actually noticed by participants, and how they were interpreted. Process tracing attempts to understand why the particular decisions/actions were taken; that is, how did it “make sense” to the practitioners embedded in the situation (Woods, 1993; Woods and Hollnagel, 2006; Dekker, 2006).

The relativistic notion of causal attribution suggests that we should seek out and rely on a broad set of contrast cases in explaining the sequence of events that led to an outcome. We explain why the practitioners did what they did by suggesting how that behavior could have been locally rational. To do this we need to understand behavior in the case in question relative to a variety of different contrast cases – what other practitioners have done in the situation or in similar situations. What we should not do, particularly when there is a demand to hold people accountable for their actions, is rely on putatively objective external evaluations of human performance such as those of court cases or other formal hearings. Such processes in fact institutionalize and legitimate the hindsight bias in the evaluation of human performance, easily leading to blame and a focus on individual actors at the expense of a system view.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.27.45