Chapter 8. A Human-Oriented Perspective

To err is human.

SENECA

8.1 The Human Element

Experience is what you get when you were expecting something else.

In the earlier chapters, we discuss various system-related disasters and their causes, both accidental and intentional. In almost all cases, it is possible to allocate to people — directly or indirectly—those difficulties allegedly attributed to “computer problems.” But too much effort seems directed at placing blame and identifying scapegoats, and not enough on learning from experiences and avoiding such problems. Besides, the real causes may implicitly or explicitly involve a multiplicity of developers, customers, users, operators, administrators, others involved with computer and communication systems, and sometimes even unsuspecting bystanders. In a few cases, the physical environment also contributes—for example, power outages, floods, extreme weather, lightning, earthquakes, and animals. Even in those cases, there may have been system people who failed to anticipate the possible effects. In principle, at least, we can design redundantly distributed systems that are able to withstand certain hardware faults, component unavailabilities, extreme delays, human errors, malicious misuse, and even “acts of God” — at least within limits. Nevertheless, in surprisingly many systems (including systems designed to provide continuous availability), an entire system can be brought to a screeching halt by a simple event just as easily as by a complex one.

Many system-related problems are attributed—at least in part, whether justifiably or not, and albeit differently by different analyses—to errors or misjudgments by people in various capacities or to just being in the wrong place at the wrong time. Examples include the following:

Requirements definition: The Stark’s missile defense system, the Sheffield communication system

System design: The 1980 self-propagating ARPAnet contamination

Human-interface design: Therac-25, Mariner 1, and the Aegis shoot-down of the Iranian Airbus

Implementation: The first launch of the Shuttle Columbia, the DIVAD (“Sergeant York”), the Patriot missile system

Modeling and simulation: Northwest Flight 255 (in which the aircraft and the MD-80 flight-training simulator behaved differently from one another), the Hartford Civic Center roof cave-in (in which an erroneous computer model was used)

Testing: The Hubble space telescope and the untested code patches that resulted in serious telephone outages

Users, operators, and management: For example, the Chernobyl shutdown-recovery experiments, the Exxon Valdez and KAL 007 running on autopilots, the Black Hawks shot down by friendly fire in the absence of proper identification, the French and Indian Airbus A320 crashes, operation at low temperature despite the known limitations of the Challenger O-rings, use of nonsecure systems in sensitive environments, operation under stress of battle as in the Iran Air 655 and the Patriot defense system, operation in the presence of degraded human abilities as in the Exxon Valdez case

Maintenance and system upgrades: Phobos 1 (lost after a botched remote patch), Viking

Innocent bystanders: Victims of Trojan horses, of personal-computer viruses, and of privacy violations

This book describes numerous instances of each type. The human element also transcends the systems themselves. The seemingly epidemic flurry of work crews downing telephone systems and computer connections by inadvertently severing telephone cables (White Plains, New York; Chicago, Illinois; Newark, New Jersey; Annandale, Virginia; and San Francisco’s East Bay) is noted in Section 2.1. The lessons of the past have taught us not to put multiple critical links in the same conduit (which was the 1986 problem in White Plains when New England was separated from the ARPAnet, all seven supposedly independent links being cut in one swell foop). Nevertheless, in June 1991 the Associated Press lost both its primary and backup links in the Annandale area, despite their being in separate (but adjacent) cables! (See Sections 2.1 and 4.1.)

Human fallibility abounds. The Aeromexico case (noted in Section 2.4) is particularly worth examining, because the blame can be distributed among the Piper pilot who crashed into the airliner in restricted airspace, the controller for not noticing him on radar, the Grumman Yankee pilot who was also in restricted airspace (which distracted the controller), the radar developers for not having been able to detect altitude, the U.S. government for not having required collision-avoidance systems earlier, perhaps even the Aeromexico pilot, and so on.

There are also Heisenbergian effects in which the mere fact that people are being observed affects their behavior. Bell Laboratories had a home-grown operating system for IBM 709x computers in the early 1960s. The system tended to crash mysteriously, but never when the system-programming wizards were in the computer room. It took months to discover an input-output timing glitch that was triggered only because the tape-handling operators were just a little sloppy in punching buttons, but only when they were not being observed.

Henry Petroski [129] has illustrated how we often learn little if anything from our successes, but that we have an opportunity to learn much from our failures. (Petroski’s observation is highly relevant to us in the computer profession, even though it is based largely on experiences gained in the engineering of bridges.) Better understanding of the notable failures that have been ongoing grist for the Risks Forum is fundamental to our hopes for building better systems in the future. However, it appears that the risks are even greater for computer professionals than in more traditional engineering fields. It is clear that we need systems that are not only fault tolerant and secure against malicious misuse, and more predictable in their behavior, but also less sensitive to the mistakes of people involved in system development and use.1

8.2 Trust in Computer-Related Systems and in People

It is incumbent upon us to devise methodologies for designing verifiable systems that meet . . . stringent criteria, and to demand that they be implemented where necessary. “Trust us” should not be the bottom line for computer scientists.

REBECCA MERCURI2

Many of the risks of using computers in critical environments stem from people trusting computer systems blindly—not realizing the possibilities that the underlying models are wrong or incomplete; the software designs basically flawed; the compilers buggy; the hardware unreliable; and the programmers and operators inadequately trained, untrustworthy, negligent, or merely marginally competent.

It has long been common to find inordinate trust placed in those who develop, operate, use, administer, and regulate the technology, or in the technologies themselves. However, there are a few signs that this situation is changing. In connection with an article3 on the 46 U.S. senators who were seeking to cut back the proposed SDI budget, Senator William Proxmire was quoted as follows:

Challenger and Chernobyl have stripped some of the mystique away from technology.

Concern is temporarily elevated after such incidents as Chernobyl and the sequence of space failures beginning with the Challenger; soon afterward, that concern seems to dwindle. Indeed, there may be cases where the risks are so great that neither the people nor the computers should be trusted, or that the system should not be built at all.

Furthermore, politically motivated positions persist that are totally oblivious to the harsh technological realities of intrinsic vulnerabilities. Anthony Lewis4 made this statement:

Now President Reagan has shown us that he has failed to learn a fundamental lesson of Chernobyl: the folly of relying on the perfectability of dangerous high technology and the human beings who run it. In the teeth of Chernobyl and the American space rocket failures, he has renewed his insistence that we can have a shield against nuclear weapons in space. He has demanded that Congress vote all the funds for his Strategic Defense Initiative, a vision so dependent on perfectly functioning rockets and computers and software [and people!] that to believe in it any longer is close to irrational.

Recall the scene near the end of the movie WarGames,5 when the computer system (WOPR) comes to a rather poignant conclusion regarding the game strategies involved with the apparent escalation of nuclear attacks and counter-attacks, in a context associated with WOPR learning to play tic-tac-toe:

A strange game. The only winning move is not to play.

There are many attributes with respect to which computer systems, networks, and applications are trusted to behave properly (explicitly or implicitly), whether or not the systems are adequately dependable:6

• Data confidentiality

• System integrity

• External data integrity (consistency with the real world)

• Internal data integrity (for example, consistency of various internal representations, as in file-system pointers and double-entry bookkeeping ledgers)

• Availability of resources (including systems, programs, and data)

• Protection of human safety

• Protection against loss of financial assets or other resources

• Functional correctness of applications, database-management systems, operating systems, and hardware

The desired properties interact in many subtle ways. In some cases, they are mutually antagonistic; in other cases, apparent antagonism can be reduced or even eliminated by constructive designs that address the requirements explicitly. The stringency of the requirements involving each of these properties may vary widely from one property to another, and from one system to another.

8.2.1 Misplaced Trust in Computer-Based Systems

People often tend to anthropomorphize computer systems, endowing the machines and software with humanlike traits such as intelligence, rational and intuitive powers, and (occasionally) social conscience, and even with some superhuman traits such as infallibility. Here are a few highly oversimplified types of misplaced trust, exemplified by the cited cases. The first example involved extensive wildlife deaths and long-term environmental damage; examples of each of the subsequent eight types resulted in human lives being lost.

Blind faith. “Just leave it to the autopilot computer.” The Exxon Valdez oil spill noted in Section 2.6 was alternately blamed on the captain’s alcohol problem and the third mate’s “severe fatigue”—according to a report of the National Commission on Sleep Disorders Research.7 The third mate and helmsman were both unaware that their rudder controls were being ignored, because they did not realize that the ship was running on autopilot. The Coast Guard had radar tracking abilities, but it had not been using them—because there had never been a need. In addition, computer records were destroyed after the accident—in spite of a federal court order that they be preserved.

Trust in technology. “With all these checks and balances, nothing can go wrong.” The “friendly-fire” shootdown of the Black Hawk helicopters over northern Iraq suggests that even the most stringent preventive measures may not be enough, especially when a prevailing assumption becomes invalid.

Trust in foolproofedness. “But the system did something it was not supposed to do.” The Therac-25 software permitted the therapeutic radiation device to be configured unsafely in X-ray mode, without its protective filter in place, as discussed in Section 2.9.1.

False sense of safety. “But the system did not do something it was supposed to do.” A DC-10 crash was attributed to a stall alarm that failed to alert the pilot; the power for the alarm was expected to come from the missing engine.

Reliance on fragmentary data. “The big picture was not readily available.” The Aegis system’s main user displays did not indicate the speed, range, and altitude of the Iranian Airbus; determination of those attributes required explicit action to bring up an auxiliary small display on which they were included among many other data items—but not including rates of change. The design of the Aegis’ human interface thus had a significant role in the Vincennes’ shootdown of the Airbus.

Complacency. “We know that this [program, data item, configuration] is wrong, but it probably won’t matter anyway.” An Air New Zealand plane crashed into Mt. Erebus in Antarctica because erroneous instrument-flight data had been discovered by controllers but not reported to the pilots, who had never previously had to depend on the data in previous visual-only flights.

Overconfidence. “The controls are too constraining. We need to turn them off while we experiment.” The folks at Chernobyl did just that.

Impatience or annoyance. “That alarm system is a nuisance; let’s turn it off.” Prior to the September 1991 New York telephone outage discussed in Section 2.1, alarms had been disabled because they had been going off frequently during construction, and thus there was no indication that the standby battery was being depleted. There were unverified reports that the pilots of the Northwest 255 flight noted in Sections 2.4 and 7.5 had disabled an alarm similar to the one that had failed on a different MD-80 two days before the crash.

Trust in emergency administrative procedures and system logic. “The emergency procedures were never tested adequately under live circumstances.” The British Midland Airways 737 shutdown of the wrong engine points to difficulties in responding to real emergencies. Deployment of the Strategic Defense Initiative’s Star Wars concept would have this difficulty as well, because thorough testing under live attacks is essentially impossible.

Trust in recovery to overcome system failures. “If something goes wrong, our backup procedures will save us.” The $32-billion-dollar overdraft at the Bank of New York noted in Section 5.7 resulted accidentally from an unchecked program counter overflow. The effects were compounded significantly by a recovery procedure that overwrote the backup by mistake. But it’s only money—one day’s interest, $5 million.

Credulity. “The computer will do it right. It never failed before.” An F-16 whose computer system was designed to prevent the aircraft from stalling stalled—because a novice pilot managed to find a flight configuration unanticipated by the program. He bailed out, but the plane crashed.

Incredulity. “The data or the program must be wrong. That’s never happened before.” Because the system analyzing ozone depletion had been programmed to reject far-out results, the correct data values were deemed too anomalous, and the first evidence showing a dramatic depletion in the Antarctic ozone layer was rejected for 8 years.

Confusion. “The outputs [data values or exception conditions] don’t make any sense. What do we do now?” In the Three Mile Island accident, the control indicators did not show the actual positions of the valves but instead the intended positions. The resulting ambiguity as to the actual settings caused enormous confusion.

Confusion in redundancy. “The computer results are mutually inconsistent. Which of them is correct?” Dual control noncomparisons may leave ambiguity. Section 2.3 notes the case of two incorrect programs that outvoted the correct program in an 3-version fly-by-wire programming experiment.

Wishful thinking. “Distributed systems are more reliable than centralized systems, and could not possibly fail totally.” The 1980 ARPAnet collapse and the 1990 AT&T collapse noted in Section 2.1 both illustrated global effects that can result from local phenomena.

Oversimplification. “Our computer system is completely secure.” Numerous penetrations and internal frauds have shown that fundamental system flaws and sloppy practice are pervasive.

Chutzpah. “We verified everything formally, and tested everything; therefore it will work correctly the first time we try it.” Fortunately, this has not yet been a serious problem—although that sounds a little like the fantasy of Star Wars.

Loss of human control and initiative. “The computer is down. Sorry, we can’t do anything for you.” This problem seems altogether too frequent. The RISKS archives include cases in banking, airline reservations, electric power, and supermarkets (for example).

In the case of the Midland 737 crash, one engine would have been sufficient to fly the plane—but not after the one still-working engine was shut down by mistake in an engine emergency. (The pilot may have misinterpreted smoke in the cabin and increased vibration as signs that it was the right engine that was failing. Normally, 70 percent of the air conditioning comes from the right engine, but apparently there was a misconception at the time that 100 percent came from that engine.) Originally, crosswiring of the alarm systems was suspected, leading to inspections of other aircraft—and to the discovery of undetected crosswiring problems in the alarm systems of at least 30 other commercial airliners. In general, it is extremely difficult to ensure that subsystems to be invoked only in true emergencies will actually work as planned when they are needed, especially if they have never before been invoked. In the Air France Airbus A-320 crash, initial reports implicated pilot error in turning off some of the automatic safety controls and flying too low. Subsequent unverified reports suggest that the fly-by-wire Airbus computer system had been suspect on previous occasions, with false altimeter readings, sudden full-throttling during landing, and sudden losses of power. The Iranian Airbus shootdown may be attributable to a combination of computer-interface problems (incompleteness of information in the Aegis system displays and inconvenience of the multiple displays—the altitude of the Airbus was not displayed directly, and its identity was mistaken for a military plane still on the runway) and human frailty—real-time pressures under fire, confusion, inexperience, and inability to question the incorrect initial assumption that was contradicted by the information that could have been displayed on the auxiliary screen. In the case of the Exxon Valdez, although the autopilot may have worked as intended (apart from the attempt by the third mate to override it?), all of the safety measures and contingency plans appear to have been of minimal value. Indeed, contrasting the Exxon Valdez situation with systems that are designed to minimize the effects of weak links, there were mostly weak links.

With respect to any of the requirements noted, trust is often misplaced. It is clear that computer systems (1) cannot ensure completely adequate system behavior, and (2) cannot enforce completely adequate human behavior. Unfortunately, these two limiting factors are usually discarded as “theoretical” and “of no practical significance”—after all, perfection is admittedly impossible. In its place, however, people are often willing to accept mediocre or incomplete systems. The desire for quick and easy answers leads to short-cut solutions that are fundamentally inadequate, but that still may persist for years—until a catastrophe illuminates shortcomings that had hitherto been systematically ignored. Ironically, such misuses of technology are generally exposed only in response to disasters (as noted by Henry Petroski). Subsequently, after some improvements are made, the previously “theoretical” arguments retrospectively become “of historical interest”—with the added emphasis, “Of course, it couldn’t happen again.”

It is difficult to raise the issue of misplaced trust in computer systems without being accused of being a Luddite—that is, a technology basher. Sound technological and social arguments are often emotionally countered by narrowly based economic arguments—such as “We can’t afford to change anything” or “Why bother with defensive measures? There’s never been a problem.” The common strategy of expending the least possible effort for (short-term) economic reasons leaves uncomfortably narrow margins for error; experience with marginally engineered systems is not reassuring. On the other hand, consideration of longer-term goals and costs often provides more than adequate justification for better systems, better administration of them, and higher levels of awareness.

8.2.2 Misplaced Trust in People

Let us also examine some of the ways in which misplaced trust in computer systems interacts with misplaced trust in people, whether or not those people are adequately dependable. In some cases, computer systems become untrustworthy because of what people do—for example, errors in design, implementation, operation, and use. In other cases, people may behave undependably because of what computer systems do—for example, because the computer systems demand too much of them (especially in real-time applications). Another common problem involves placing trust in other people, who, in turn, place excessive trust in technology—oblivious to the limitations.

In essentially every computer system, there are privileged individuals who in some sense have to be more trustworthy than others—for example, system programmers, database administrators, and operators. Most system designs do not permit or encourage carefully compartmentalized privileges, and instead provoke the use of omnipotent mechanisms (for example, superusers). Such mechanisms are intrinsically dangerous, even if used with good intent—particularly if they can be subverted by someone masquerading as a privileged user—or if they are misused by privileged users.

8.2.3 “Trust” Must Be Trustworthy

Several principles of good software engineering are helpful in reducing the excessive assignment of trust—such as abstraction and information hiding. For example, the principle of separation of duties (for example, Clark and Wilson [25]) and the principle of least privilege (Section 7.8) together provide an aid to designing systems and applications so that only critical portions of the system need be trustworthy and so that privileges may indeed be partitioned. In a well-structured system that observes these principles, it is possible to reduce the extent to which successful operation must depend on the proper behavior of both ordinary users and partially privileged people. That is, the system should be capable of protecting itself against both intentional and accidental misuse. In exactly the same way that computer systems can be made fault tolerant through appropriate use of redundancy in hardware and software, there is a challenge in design and administration to make system use human-error tolerant.

In some applications, we mistakenly trust systems—as in systems that fail to operate acceptably because of bad design and implementation, or whose security controls are implemented on weak computer systems that are actually easy to subvert. In other applications, we mistakenly trust people—to develop systems whose use is much more critical than generally realized, or to use systems that can easily be compromised. In the worst case, we can trust neither the systems nor the users and therefore must go to great pains to design systems and administrative controls that constrain and monitor system use. Unfortunately, any system necessitates that certain system components and certain people be trusted—whether or not they are trustworthy—even in the presence of fault-tolerance techniques (including Byzantine algorithms, which make few if any assumptions about what cannot happen). Intentional or accidental subversion of those system components can in many cases be devastating. In particular, both trusted people and interlopers have the ability to subvert the system and its applications. But systems that of their own accord simply fail to do what is required or expected can also be disastrous.

8.2.4 What and Whom Should We Trust?

As we see from the foregoing examples, risks come from many sources—not just from design and implementation flaws, human maliciousness, and accidents, but also from unforeseen combinations of problems and difficulties in responding to them. Any computer systems in which trust can justifiably be placed should be able to anticipate all such risks.

A few generalizations are in order.

Predictability

No system can ever be guaranteed to work acceptably all of the time. In a complex system, it is essentially impossible to predict all the sources of catastrophic failure. This caveat is true even in well-engineered systems, where the sources of failure may be subtle. Risks may come from unexpected sources. A system that has run without serious failure for years may suddenly fail. Hardware may fail. Lurking software flaws may surface. The software—and indeed the entire system—may be impossible to test fully under live conditions (as in Star Wars), especially in systems involving life-critical responses to real-time events. Software may fail because of changes external to it—for example, as a result of reconfiguration or updates to other programs on which that software depends. Classical quantitative risk assessment is superficially attractive but sorely limited in practice, especially if based on false assumptions—which often can have devastating consequences. Experience with past disasters is valuable, but only partially useful in anticipating future disasters.

System structure and complexity

It is important to understand the ways in which trust is placed in different components. It is usually a fantasy to believe that critical concerns such as security and reliability can be confined to a small portion of a computer system or replicated portion of a distributed system, particularly with conventionally designed computer systems. A realistic, generalized, trusted computing base—on which the adequacy of system operation can depend—tends to be large, especially if the totality of requirements is encompassed and the requirements and interactions with one another are explicitly recognized. This difficulty is particularly relevant for human safety. Nevertheless, hierarchical design, encapsulation, and careful distributed implementation can help to confine bad effects. Thus, a fundamental design goal is to partition a system so that different properties are maintained with high integrity by corresponding components, and that the interactions among different components can be controlled carefully.

Defensive design

Complex systems must be designed conservatively and defensively, especially when they operate under extreme circumstances. In critical applications, weak links are particularly dangerous; great care should be taken to avoid them. Assume the worst; then you can be thankful if things go well. Systems should identify different levels of trustworthiness and prevent dependence on less trustworthy subjects (for example, users) and objects (programs and data). Various notions of integrity (for example, system integrity, application integrity, and data integrity) are vital. Responses should be anticipated for the widest range of unexpected behavior. Furthermore, systems and applications should observe the many principles of good design (of which separation of duties and least privilege are cited as examples—for reliability and for security as well). Sound software-engineering practice provides no easy answers, but even riskier are the archaic techniques often found in lowest-bidder or overly simplistic efforts.

Distributed systems

The notion that distributed control solves problems not easily solved with central control is also largely a myth. Problems of updating, synchronization, concurrency, backup, and verifiability (for example) may simply appear in different guises, and some manifestations may be much harder to analyze than others.

The environment

Vagaries of the operating environment (such as power interruptions, extreme weather conditions, interference, and lightning strikes) may defeat sound design and implementation—irrespective of how well the computer systems are engineered. Thorough awareness of all risks can lead to systems with substantially fewer and less critical risks.

People

People in the loop may make the risks worse rather than better—especially if the people must operate under tight real-time constraints. Complex human interfaces are inherently risky, but the importance of sound interface design is generally underestimated. Emergency preparedness is difficult at best, and often is hampered by people taking inappropriate emergency actions. Beware of people who think they are more reliable than the computer system. But also beware of anyone in authority who has an inordinate trust—and lack of suspicion—in computers and those who employ them. Such people constitute a serious risk, especially when they give up their own responsibility to the technology.

To reduce the serious risks associated with excessive or inappropriate trust being placed in computer technology, we can take several steps:

• The many different senses in which trust is conferred must be made explicit—identifying potentially all assumptions about technology, people, and the environment.

• Systems must be designed and implemented defensively with respect to all senses of trust, and operated with continual awareness of the risks. Despite significant advances in system and software engineering, there is still too much ad-hoc-ery.

• All possible modes of human behavior must be considered, and those that are plausible must be anticipated in the system design, implementation, and operation.

• The myth of technological infallibility must be thoroughly debunked, repeatedly. This theme is revisited in Sections 9.5 and 9.8.1.

8.3 Computers, Ethics, and the Law

As high-risk uses of computer-related systems continue to increase, it is important to consider some of the critical roles that people play relative to those systems, and, in particular, the potential roles of ethics and values.

We have considered problems such as system penetrations, abuses of authority, tampering, and other system misuses, spoofed E-mail, and risks in ballot recording and tabulating. Relevant cases of misuse in the past have also included financial fraud, antisocial surveillance, and telephone phreaking.

There has been extensive discussion regarding whether access requiring no authorization violates any laws. Irrespective of the laws, Gene Spafford [158] concludes that computer breakins are categorically unethical. But what good are computer ethics in stopping misuse if computer security techniques and computer fraud laws are deficient? Relating back to Section 3.1, techniques to narrow the sociotechnical gap are not particularly effective if the technological gap and the social gap are both wide open.

In [110]8 I wrote the following:

Some Risks Forum contributors have suggested that, because attacks on computer systems are immoral, unethical, and (hopefully) even illegal, promulgation of ethics, exertion of peer pressures, and enforcement of the laws should be major deterrents to compromises of security and integrity. But others observe that such efforts will not stop the determined attacker, motivated by espionage, terrorism, sabotage, curiosity, greed, or whatever. . . . It is a widely articulated opinion that, sooner or later, a serious collapse of our infrastructure—telephone systems, nuclear power, air traffic control, financial, and so on—will be caused intentionally.

Certainly, better teaching and greater observance of ethics are needed to discourage computer misuse. However, we must try harder not to configure computer systems in critical applications (whether proprietary or government sensitive but unclassified, life-critical, financially critical, or otherwise depended on) when those systems have fundamental vulnerabilities. In such cases, we must not assume that everyone involved will be perfectly behaved, wholly without malevolence and errors; ethics and good practices address only a part of the problem—but are nevertheless important.

Superficially, it might seem that computer security would be unnecessary in an open society. Unfortunately, even if all data and programs were freely accessible, integrity of data, programs, and systems would be necessary to provide defenses against tampering, faults, and errors.

A natural question is whether the value-related issues raised by the use of computer systems are substantively different from those that arise in other areas. The advent of computer technology has brought us two new ethical dilemmas.

• People seem naturally predisposed to depersonalize complex systems. Computers are not people, and therefore need not be treated humanely. Remote computer access intensifies this predisposition, especially if access can be attained anonymously or through masquerading. General ambivalence, a resulting sublimation of ethics, values, and personal roles, and a background of increasingly loose corporate moralities (for example, savings and loan and other insider manipulations, and ecological abuses) seem to encourage in some people a rationalization that unethical behavior is the norm, or somehow justifiable.

• Computers have opened up radically new opportunities, such as distributed and multipartner fraud, high-speed crosslinking, global searching and matching of enormous databases, junk E-mail, undetectable surveillance, and so on. These capabilities were previously impossible, inconceivable, or at least extremely difficult.

Most professional organizations have ethical codes. Various nations and industries have codes of fair information practice. Teaching and reinforcement of computer-related values are vitally important, alerting system purveyors, users, and would-be misusers to community standards, and providing guidelines for handling abusers. But we still need sound computer systems and sound laws.

Each community has its own suggestions for what to do about these problems.

• System technologists typically see the need for better systems and networks—with increased security, reliability, and safety. For example, improved operating systems, user-to-system and system-to-system authentication, network encryption, and privacy-enhanced mail (PEM)9 can significantly increase the security attainable. The evidence of this book suggests that we must do better in developing and using life-critical systems—and indeed some progress is being made in reliability and human safety.

• Some legislators and lawyers see a need for laws that are more clearly enforceable and in some cases more technology specific. Issues raised include familiar topics such as liability, malpractice, intellectual property, financial crimes, and whistle-blowing. In these and other areas, difficulties arise in applying the existing laws—which often were not written with all of the idiosyncrasies of the computer era in mind. Examples include remote access from another country with different laws, and definitions of what constitutes authorization and misuse. Law-enforcement communities typically seek more arrests, more prosecutions, and more jails—which might become less relevant if the technology were doing its job better. Insurers also can play a constructive role, particularly if they encourage the development of systems that help to reduce the risks—not just the risks to insurers but also the risks to the insured.

• Social scientists see many needs that transcend technology and the law. Examples include restructuring our societies, providing better education, encouraging greater human interaction and cooperation generally, reducing inequities between people with access to the emerging information superhighway and everyone else, and pervasively reinforcing and encouraging ethical behavior from cradle to grave.

Such a diversity of perspectives is typical when a new technology emerges. At any one time, certain interest groups may seek economic, political, ideological, or emotional leverage; each group may view its goals as predominant, and may ignore the other groups. It is dangerous to believe that one approach is more correct or has a higher priority than another. Each approach can contribute positively—whereas its absence can lead (and has led) to serious consequences. Each of the three perspectives must be respected, within a coordinated effort that unifies them. Consequently, these perspectives and others must evolve further so that the technology, the laws, and the social norms all become much more compatible with one another than they are now.10

8.4 Mixed Signals on Social Responsibility

Section 8.4 originally appeared as an Inside Risks column, CACM, 34, 8, 146, August 1991, and was written by Ronni Rosenberg, manager of the Documentation Department at Kendall Square Research, Waltham, MA. She is a member of the ACM Committee on Computers and Public Policy.

What is the appropriate role of computer professionals in determining how their work is used? Should they consider the societal implications of their work?

Computer scientists receive mixed answers to these questions. On the one hand, there has been increasing talk about the importance of scientists and engineers playing a major role in shaping the social environment in which their work is used. Statements to this effect issue regularly from some of our most prominent universities. In the January 1991 issue of Computing Research News, Rick Weingarten, Executive Director of the Computing Research Association, stressed the need for computer scientists to play an active role in science and technology policy. The ACM issued a strongly worded statement calling on its members to adhere to the privacy guidelines in the ACM Code of Professional Conduct. The code advises members to consider the influence of their work on individual privacy, to express their professional opinions to employers or clients “regarding any adverse consequences to the public which might result from work proposed,” and to “consider the health, privacy, and general welfare of the public” as part of their work.

More generally, National Science Foundation director Walter Massey urged all scientists to devote more time to the public-policy aspects of their work: “Members of the science and engineering communities should play a more significant role in our representative democracy. Being a member of these professions does not preclude participation in the political process.” In 1989, a National Academy of Sciences report said, “science and technology have become such integral parts of society that scientists can no longer abstract themselves from societal concerns. . . . [D]ealing with the public is a fundamental responsibility for the scientific community. Concern and involvement with the broader issues of scientific knowledge are essential if scientists are to retain the public’s trust.”

Having seen what these scientific institutions say about the context in which technologists should conduct their work, let’s look at what the computing profession does.

The computing profession encourages computer scientists to be narrow technocrats. Most computer-science curricula pay little or no attention to “social impacts.” This shortcoming reflects a widespread view that a computer-science degree is a (purely) technical degree. Where computers-and-society courses are available, they often are offered by departments other than computer science (for example, philosophy or sociology), and are taught by people other than computer scientists. In this way, computer-science students are taught that social effects of computing are topics for other disciplines, rather than their own, to consider.

Few senior computer scientists devote time to public service or consider the social implications related to their work. Through the example of their professional role models and the policies of their schools, computer-science students learn these lessons well: The “best” work is what extends the technical state of the art, and computer scientists should not care about how (or whether) the fruits of their work are used.

Outside of school, whether a computer scientist is employed in academia or in industry, an interest in social implications of computing must be satisfied outside of working hours, after the real work is done, if it is to be satisfied at all. Businesses may enable employees to attend technical conferences on the company’s time, but such flexibility is not likely to be extended to testify before Congress. Expressing an opinion about the effects of a person’s work on the public is not a recipe for professional advancement. Professional rewards in computer science—tenure, promotion, salary increase, publication opportunities—are too often proportional to the single-minded devotion to the technical aspects of one’s job.

In short, the profession demonstrates, as clearly as it knows how, that the relationship between computing and society is not a valid part of computer science. Of course, not all computer science departments, businesses, and computer-scientists adhere to this view! Nonetheless, this broadly painted picture accurately captures the spirit of many participants in the field.11

Why should we care? Because Rick Weingarten, the ACM Code, Walter Massey, and the National Academy of Sciences are right. The context in which computer systems exist—who pays for them, who participates in their design, how they are used, and how they are viewed by policymakers—is at least as significant an indicator of the systems’ impact and value as are technical factors. Computer scientists are not the only ones who should consider the context of computer-system use, but their expertise imposes a special obligation to participate actively in the discussion.

8.5 Group Dynamics

Above all, by formally recognizing the existence of fuzziness, we will realize why management isn’t the way it’s supposed to be—why it probably shouldn’t be the way it’s supposed to be. We’ll learn that the problems caused by fuzziness are not to be avoided at all costs, but instead are problems to be worked on and lived with. Perhaps most important of all, we will come to find that the fuzzy side of management not only poses serious problems but opens up unusual opportunities as well—and only then may we claim to fully understand what properly unbusinesslike management is all about.

ROGER GOLDE [51]

In this section, we review some of the influences governing collaborative and other multiperson interactions in the development and use of computers and communications, from the perspectives of the risks involved.

8.5.1 Large Programming Projects

“Too many cooks spoil the broth.” This proverb certainly identifies a risk in programming projects. Large teams often cause more serious problems than they solve. However, many systems are intrinsically so complex that they cannot be concocted by a few superchefs—and so large teams are used. Controlling a multiperson development project that encompasses multiple system versions can become complicated. This book includes discussion of various systems and system developments (for example, see Section 7.4) that failed at least partly because of the unmanageability of their complexity—for which system engineering and software engineering can provide only limited help. We must increasingly question the viability of huge computer-system developments, especially if the use of those systems is life-critical or otherwise societally risky.12

8.5.2 Fast, Far-Reaching Interactions

Computer and communication technologies are radically changing the ways in which people can collaborate, both locally and worldwide. Faxes and E-mail messages are able to reach out to multitudes at arbitrary distances, economically and almost instantaneously. The enormous potential influence of such networking is evident from the 30,000 people who asked within a very short time to have themselves removed from the Lotus Marketplace Households database (Section 6.1). The influence is also evident from the electronic dissemination of news after Tienanmen Square and during the failed Soviet coup attempt against Gorbachev. These potentially unifying technologies can play a major role in the establishment and maintenance of democratic institutions and the spread of information. But they can be used repressively as well. For example, there was a report from the Northwest Information Agency via Aldis Ozols in Sydney, Australia, about high-powered electronic retaliation during the Soviet coup: “All fax machines and computers at publishing houses of democratic newspapers Smena and Nevskoye were burnt by strong electric impulses.”

Easy communications and rapid interactions entail some risks. Proprietary rights can be flagrantly abused as software and text migrate on their merry ways. Trojan horses and personal-computer viruses can propagate more easily. Messages may be spoofed. The ability for people to vote instantaneously from their homes or offices in referenda on important issues would also present risks, such as emotional, simplistic, or knee-jerk responses to complex issues, sometimes with irreversible consequences.

On-line newsgroups have proliferated wildly, covering every imaginable topic. The more carefully moderated newsgroups are providing serious international educational and cultural benefits. Other newsgroups act as sandboxes for newsgroupies. There is considerable potential for overzealous flaming and rapid spread of false information.

8.5.3 Collaborative Attacks on Security

One of the security notions aimed at defeating single weak-link security attacks is separation of duties—for example, splitting administrative and technical duties, or partitioning a superuser facility into distinct subprivileges. Another is requiring two persons to authorize a particularly sensitive action—for example, two-key systems. Perhaps it is only twice as difficult to bribe two people as it is to bribe one, unless they happen to be working together. The concept of fault tolerance could be generalized to security in the sense that an authorization system (for example) could be made n-person tolerant. However, the vagaries of human behavior suggest that Byzantine fault tolerance would be a more appropriate model but might merely encourage further collaborative attacks.

8.5.4 International Perspectives

National boundaries are rapidly fading in the electronic world. However, there are still isolationist or nationalistic self-interest movements that mitigate against increased collaborations in spite of a changing world.

Marketplace competition. Computer-system vendors seeking proprietary advantages may hinder development of open systems and networks (which themselves, rather ironically, require data confidentiality and system integrity to protect against abuses of openness).

National security, law enforcement, and international privacy. Encryption-based systems have raised many concerns regarding policy, standards, and export controls, as noted in Clark Weissman’s Inside Risks column, “A National Debate on Encryption Exportability,” CACM 34, 10, 162, October 1991. One concern is that government actions in the name of national security can have a chilling effect on international cooperation and on the domestic marketplace, as well as counterproductively stimulating competing international efforts. For example, there has been considerable discussion in the on-line Risks Forum regarding the proposed digital-signature standard (DSS).13 The Escrowed Encryption Standard (EES) involving the classified SKIP-JACK encryption algorithm and its implementation in the Clipper Chip for secure telephony, and the Capstone chip for secure data communications (Section 6.2), have added further fuel to the fire. The U.S. government sees the EES as providing increased privacy while at the same time not defeating the purposes of national security and law enforcement. The privacy community sees the EES as a further threat to privacy, especially if other uses of encryption were to become illegal.

Good ideas and good software (especially if free!) tend to propagate, irrespective of attempted controls that impede the construction of new means of communication.

We are now entering an era of much greater international effort, particularly in Europe and Japan. There are various stumbling blocks—economic and governmental more than technological. However, open collaboration that constructively uses the emerging computer and communication technologies can significantly reshape our world.

8.6 Certification of Computer Professionals

The Risks Forum has covered numerous cases in which software developers were at least partially responsible for disasters involving computer systems. We summarize here an on-line discussion on whether software developers should undergo professional certification, as in engineering disciplines.14

John H. Whitehouse made various arguments in favor of certification. There are not enough qualified people. Managers are not sufficiently knowledgeable about technical issues. Many practitioners survive despite poor performance, whereas many excellent people do not receive adequate credit. “Hiring is expensive and usually done pretty much in the blind. Firing is risk-laden in our litigious society. . . . It is my contention that the vast majority of software defects are the product of people who lack understanding of what they are doing. These defects present a risk to the public, and the public is not prepared to assess the relative skill level of software professionals.” Fear of failing may cause some people to oppose voluntary certification. “Furthermore, academics have not joined in the debate, because they are generally immune from the problem.”

Theodore Ts’o presented an opposing view. He sees no valid way to measure software “competence.” “There are many different software methodologies, all with their own adherents; trying to figure out which ones of them are ‘correct’ usually results in a religious war.” He also expressed serious concern that, under a certification system, the software profession might become a guild, protecting mediocrity and excluding really qualified people.

Martyn Thomas noted that certification does not necessarily help. Also, creating a closed shop is inherently risky because it enhances the status and incomes of those admitted at the expense of those excluded, and can easily become a conspiracy to protect the position of the members. However, on balance, some certification is desirable, “for staff who hold key positions of responsibility on projects that have significance for society.” He added that many countries already have mandatory certification for other engineers. The United Kingdom defense standards for developing safety-critical software, DEFSTAN 00-55 and 56, are noted in Section 7.8.1, and have significant implications for the competence and experience required for developers of safety-critical systems.

Gary Fostel noted the problem of scale: There are significant differences between small systems and large ones.

Large, complex software systems have problems that are not readily visible in the small-scale applications. In my software development courses, I commonly tell students that the methods that will be required of them are not necessarily the most efficient methods for the class project required of them. For the trivial sort of work I can require of students in a semester, there is really no need for comments . . . requirements analysis . . . and formal design, and so on for most of the techniques of software engineering. On the other hand, as the size of the problem grows, and the customer becomes distinct from the development, and the development stuff becomes fluid, and the effort expands in numerous other dimensions toward bewildering complexity, the methods . . . are in fact necessary.

Paul Tomblin observed the “Ritual of the Calling of an Engineer” (the Iron Ring), created by Rudyard Kipling before there was a legal status for engineers; Kipling’s “Obligation” included this statement: “For my assured failures and derelictions, I ask pardon beforehand of my betters and my equals in my calling . . . .” Paul added, “So we admit that everyone fails at some time, and we aren’t going to crucify you if you screw up, providing you did so honestly, and not because you were lazy or unprofessional.”

Russell Sorber noted the voluntary certification provided by the Institute for Certification of Computer Professionals in Park Ridge, Illinois. Nurses, physicians, pilots, civil engineers (even hair stylists) are licensed; he reiterated the thought that he would like life-critical systems to be built by licensed or certified professionals. John Whitehouse added that the ICCP takes great pains to prevent development of a guild mentality—for example, with continual review and updating of the certification process.

There was also some discussion of whether certification would stifle creativity, originality, and excellence; in summary, it might, but not necessarily.

This debate is an old one. In this exchange of views, the sentiments generally favored certification, with various caveats. There is need for a balanced position in which there is some certification of both individuals and institutions involved in the development of high-risk computer systems, but in which the certification process itself is carefully circumscribed. Certification of the systems produced is also important. Teaching and systematic use of modern development techniques are also important pieces of the puzzle, as is the reinforcement of ethical behavior. Martyn Thomas noted that certification is only a mechanism for control; it has to be exercised in the right direction if there is to be an improvement.

8.7 Summary of the Chapter

Most of the problems exhibited in this chapter are ultimately human problems. Difficulties in system use and operation have strong human components. Problems in conception, requirement specification, design, and implementation are also people intensive. The technological factors and the human factors must be considered together. People involved in system development and operation must be more aware of the critical roles that they play. Significantly better education and training in computer-related activities are needed at all levels, with greater emphasis on system-oriented thinking.15

Challenges

C8.1 Put yourself in the shoes and mindset of the individuals who were involved in a few of the disasters described in the earlier chapters. Try to choose disasters that were induced or directly caused by human behavior. What would you have done differently at the time? What would you do differently now, having read this book up to this point? (After contemplating those questions, you are ready to read my conclusions in Chapter 9.)

C8.2 Choose two particularly disastrous cases from this book. See how widely or narrowly you can spread the blame. What portion of it falls on nonhuman shoulders? Under what circumstances do you think it is reasonable to try to allocate blame? If possible, dig beyond the details presented in the book. (In numerous cases, the identification of the real causes remains either unresolved or closely guarded.)

C8.3 Consider the types of misplaced trust itemized in Section 8.2.1. In particular, address the last type (loss of human control and initiative), and analyze it in detail. Enumerate cases in which this problem has affected you personally. Choose three other types, and find examples among the cases in this book other than those explicitly noted in Section 8.2.1.

C8.4 Can you conceive of a set of standards for certification of professionals that, on one hand, would be attainable by enough system designers and programmers to satisfy the demand for their skills, and that, on the other hand, would be stringent enough that satisfying it would necessarily imply a suitable level of competence for the development of life-critical systems? Explain your answer.

C8.5 Try to obtain a set of professional ethics (with no clues given here as to how to go about it) for some organization of interest to you, such as the ACM, the Institute of Electrical and Electronics Engineers (IEEE), the IEEE Computer Society, the American Bar Assocation, the American Medical Association, or the Federal Bureau of Investigation. How difficult was it to acquire? What does the level of difficulty suggest? Consider three dissimilar cases of malicious system misuse, such as those in Sections 5.1, 5.4, and 5.6, related to the field of your choice. How might your chosen ethical code have made a difference? What else might have been helpful?

C8.6 Do you believe that computer-related systems raise value-related issues that are substantively different from those in other kinds of technologically based systems? Explain your answer.

C8.7 Under what circumstances might you accept employment in a company whose primary objective ran counter to your own principles? Would you then strive to change that company’s policies? What would you do if you discovered what you considered to be a serious breach—legal, ethical, moral, or otherwise? Discuss your answer. Discuss the implications of your being a whistle-blower—a role that by itself represents some high risks relating to technology.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.202.187