Chapter 1. The Nature of Risks

We essay a difficult task; but
there is no merit save in
difficult tasks.

OVID

COMPUTER SYSTEMS ENABLE US to do tasks that we could not even dream of doing otherwise—for example, carrying out extremely complex operations, or searching rapidly through mammoth amounts of information. They can also fail to live up to our expectations in many different ways, sometimes with devastating consequences.

This book analyzes a large collection of problems experienced to date, and provides insights that may be helpful in avoiding such consequences in the future. Hindsight can be valuable when it leads to new foresight. I hope that this book will serve that purpose, helping us to attain a safer, sounder, and more secure future in our interactions with and dependence on computers and related technologies.

1.1 Background on Risks

This book refers to vulnerabilities, threats, and risks in computers and related systems. Strict definitions always tend to cause arguments about subtle nuances, and tend to break down in specific cases. Because the primary emphasis in this book is on the big picture, we seek definitions that are intuitively motivated.

• A vulnerability is a weakness that may lead to undesirable consequences.

• A threat is the danger that a vulnerability can actually lead to undesirable consequences—for example, that it can be exploited intentionally or triggered accidentally.

• A risk is a potential problem, with causes and effects; to some authors, it is the harm that can result if a threat is actualized; to others, it is a measure of the extent of that harm, such as the product of the likelihood and the extent of the consequences. However, explicit measures of risk are themselves risky (as noted in Section 7.10) and not a major concern here. What is important is that avoiding risks is an exceedingly difficult task that poses a pervasive problem.

1.1.1 Terminology

The following intuitively based English language terms are applied to computer and communication systems throughout this book, and are introduced here. These definitions are generally consistent with common technical usage. (Each term tends to have a conceptual meaning as well as a relative qualitative meaning.) Further definitions are given as they are needed.

Reliability implies that a system performs functionally as is expected, and does so consistently over time. Reliability is also a measure of how well a system lives up to its expectations over time, under specified environmental conditions. Hardware reliability relates to how well the hardware resists malfunctions. Software reliability relates to how well the system satisfies certain functional requirements. Reliability in its various forms is the main concern of Chapter 2.

Security implies freedom from danger, or, more specifically, freedom from undesirable events such as malicious and accidental misuse. Security is also a measure of how well a system resists penetrations by outsiders and misuse by insiders. Security in its many manifestations is the main topic of Chapters 3 and 5.

Integrity implies that certain desirable conditions are maintained over time. For example, system integrity relates to the extent to which hardware and software have not been altered inappropriately. Data integrity relates to data items being as they should be. Personnel integrity relates to individuals behaving in an appropriate manner. Integrity encompasses avoidance of both accidental and intentional corruption.

These three terms appear to overlap significantly, and indeed that is the case no matter how carefully definitions are chosen. These concepts are inherently interrelated; uses of technology must necessarily consider them together. Furthermore, there is never an absolute sense in which a system is secure or reliable.

1.1.2 Perspective

Increasingly, we depend on computer systems to behave acceptably in applications with extremely critical requirements, by which we mean that the failure of systems to meet their requirements may result in serious consequences. Examples of critical requirements include protection of human lives and resources on which we depend for our well-being and the attainment, with high assurance, of adequate system reliability, data confidentiality, and timely responsiveness, particularly under high-risk situations. A challenge of many computer-system designs is that the entire application environment (rather than just the computer systems) must satisfy simultaneously a variety of critical requirements, and must continue to do so throughout its operation, maintenance, and long-term evolution. The satisfaction of a single requirement is difficult enough, but the simultaneous and continued satisfaction of diverse and possibly conflicting requirements is typically much more difficult.

The collection of diverse failures presented here is representative of computer-related problems that have arisen in the past. Understanding the reasons for these cases can help us at least to reduce the chances of the same mistakes recurring in the future. However, we see that similar problems continue to arise. Furthermore, major gaps still exist between theory and practice, and between research and development.

We take a broad, system-oriented view of computer-related technologies that includes computer systems, communication systems, control systems, and robots, for example. We examine the role of computers themselves from a broad perspective. Hardware, software, and people are all sources of difficulties. Human safety and personal well-being are of special concern.

We explore various inherent limitations both of the technology and of the people who interact with it. Certain limitations can be overcome — albeit only with significant effort. We must strive to promote the development and systematic use of techniques that can help us to identify the intrinsic limitations, and to reduce those that are not intrinsic — for example, through better systems and better operational practices. We also must remain keenly aware of the limitations.

There are many pitfalls in designing a system to meet critical requirements. Software-engineering techniques provide directions, but no guarantees. Experience shows that even the most carefully designed systems may have significant flaws. Both testing and formal verification have serious deficiencies, such as the intrinsic incompleteness of the former and the considerable care and effort necessary in carrying out the latter. All in all, there are no easy answers.

Unfortunately, even if an ideal system were designed such that its specifications were shown to be consistent with its critical requirements, and then the design were implemented correctly such that the code could be shown to be consistent with the specifications, the system would still not be totally trustworthy. Desired behavior could be undermined by the failure of the underlying assumptions (whether implicit or explicit), even temporarily. Such assumptions are far-reaching, yet often are not even stated—for example, that the requirements are complete and correct, that there are no lurking design flaws, that no harmful malicious code has been inserted, that there are no malicious users (insiders or outsiders), that neither the data nor the system has been altered improperly, and that the hardware behaves predictably enough that the expected worst-case fault coverage is adequate. In addition, human misuse or other unanticipated problems can subvert even the most carefully designed systems.

Thus, there is good news and there is bad news. The good news is that computer system technology is advancing. Given well-defined and reasonably modest requirements, talented and diligent people, enlightened and altruistic management, adequate financial and physical resources, and suitably reliable hardware, systems can be built that are likely to satisfy certain stringent requirements most of the time. There have been significant advances—particularly in the research community—in techniques for building such computer systems. The bad news is that guaranteed system behavior is impossible to achieve—with or without people in the operational loop. There can always be circumstances beyond anyone’s control, such as floods, lightning strikes, and cosmic radiation, to name a few. Besides, people are fallible. Thus, there are always inherent risks in relying on computer systems operating under critical requirements—especially those that are complex and are necessary for controlling real-time environments, such as in fly-by-wire aircraft that are aerodynamically unstable and cannot be flown without active computer control. Even the most exhausting (but still not exhaustive) testing leaves doubts. Furthermore, the development community tends to be slow in adopting those emerging research and development concepts that are practical, and in discarding many other ideas that are not practical. Even more important, it is inherent in a development effort and in system operation that all potential disasters cannot be foreseen—yet it is often the unforeseen circumstances that are the most disastrous, typically because of a combination of circumstances involving both people and computers. A fundamental conclusion is that, even if we are extremely cautious and lucky, we must still anticipate the occurrences of serious catastrophes in using computer systems in critical applications. This concept is also explored by Charles Perrow, who illustrates why accidents must be considered to be normal, rather than exceptional, events [126].

Because most of the examples cited here illustrate what has gone wrong in the past, the casual reader may wonder if anything has ever gone right. I have always sought to identify true success stories. However, there are few cases in which system developments met their requirements, on budget, and on time; even among those, there have been many developmental and operational problems.

The technological aspects of this book consider how we can improve the state of the art and enhance the level of human awareness, to avoid in the future the problems that have plagued us in the past. It is important to consider the collection of examples as representing lessons from which we must learn. We must never assume infallibility of either the technology or the people developing and applying that technology. Indeed, in certain situations, the risks may simply be too great for us to rely on either computers or people, and it would be better not to entrust the application to automation in the first place. For other applications, suitable care in system development and operation may be sufficient to keep the risks within acceptable limits. But all such conclusions depend on an accurate assessment of the risks and their consequences, an assessment that is typically lacking.

1.2 Sources of Risks

Some of the many stages of system development and system use during which risks may arise are listed in Sections 1.2.1 and 1.2.2, along with a few examples of what might go wrong. Indeed, every one of these categories is illustrated throughout the book with problems that have actually occurred. Techniques for overcoming these problems are considered in Chapter 7.

1.2.1 Sources of Problems Arising in System Development

Problems may occur during each stage of system development (often involving people as an underlying cause), including the following:

System conceptualization: For example, inappropriate application of the technology when the risks were actually too great, or avoidance of automation when use of the technology would have been beneficial

Requirements definition: Erroneous, incomplete, or inconsistent requirements

System design: Fundamental misconceptions or flaws in the design or specification of either the hardware or the software

Hardware and software implementation: Errors in chip fabrication or wiring, program bugs, accidentally or intentionally installed malicious code capable of causing unanticipated effects (for example, Trojan horses such as time bombs and logic bombs, or viruses in the floppy-disk shrinkwrap—see Section 3.1)

Support systems: Poor programming languages, faulty compilers and debuggers, misleading development tools, editing mistakes

Analysis of system concepts and design: Analyses based on false assumptions about the physical world, the operating environment, or human behavior; erroneous models, erroneous simulations

Analysis of the implementation—for example, via testing or code verification: Incomplete testing, erroneous verification, mistakes in debugging

Evolution: Sloppy redevelopment or maintenance, misconceived system upgrades, introduction of new flaws in attempts to fix old flaws, incremental escalation applied to inordinate complexity (the straw that breaks the camel’s back)

Decommission: Premature removal of a primary or backup facility before its replacement is fully operational; hidden dependence on an old version that is no longer available but whose existence is necessary (for example, for compatibility); inability to decommission a system before its operation becomes unmanageable (the albatross effect)

1.2.2 Sources of Problems in System Operation and Use

Problems may also arise during system operation and use (typically involving people or external factors), including the following:

Natural environmental factors: Lightning, earthquakes, floods, extreme temperatures (as in the Challenger loss), electromagnetic and other interference including cosmic radiation and sunspot activity, and myriad other natural occurrences

Animals: Sharks, squirrels, monkeys, and birds are included as causative factors in the cases presented here, and—indirectly—a cow; a pig appears in Section 5.3, trapped in an electronic-mail spoof

Infrastructural factors: Loss of electrical power or air conditioning

Hardware malfunction: Equipment malfunctions, due to causes such as physical aging, transient behavior, or general-sense environmental causes, such as those noted in the preceding three items

Software misbehavior: Unexpected behavior resulting from undetected problems in the system-development process, or from subsequent system changes, or from faulty maintenance after installation (see Section 1.2.1)

Communication media failures: Transmission outages, natural interference, jamming

Human limitations in system use: System operators, administrators, staff, users, or unsuspecting bystanders may cause problems throughout, for example,

Installation: Improper configuration, faulty initialization, incompatible versions, erroneous parameter settings, linkage errors

Misuse of the overall environment or of the computer systems: Such problems include unintentional misuse such as entry of improper inputs, lack of timely human response, misinterpretation of outputs, improper response to warning messages, execution of the wrong function; they also include intentional misuse, such as penetrations by supposedly unauthorized users, misuse by authorized users, malicious insertion of Trojan horses, fraud, and so on. (See Section 3.1.)

1.3 Adverse Effects

There are many areas in which computers affect our lives, and in which risks must be anticipated or accommodated. Several of these areas are listed here, along with just a few types of potential risks associated with the causes listed in Sections 1.2.1 and 1.2.2. All of these areas are represented in the following text.

Computers and communications: Loss or corruption of communication media, nondelivery or corruption of data, insertion of bogus or spoofed electronic-mail messages or commands, loss of privacy through wire taps and computer taps—Section 2.1 and Chapter 5

Computers that are used in space applications: Lost lives, mission failures, launch delays, failed experiments, large financial setbacks—Section 2.2

Computers that are used for defense and warfare: Misidentification of friend or foe, failure to attack, failure to defend, accidental attacks, accidental self-destruction, “friendly fire”—Section 2.3

Computers that are used in transportation: Deaths, delays, fires, sudden acceleration, inability to brake or control a vehicle, inability to escape from an automatically controlled vehicle after power interruption—Section 2.4 for civil aviation, Section 2.5 for railroads, and Section 2.6 for shipping

Computers used in controlling safety-critical applications: Deaths, injuries, delays, inconvenience—Section 2.7 for control systems, Section 2.8 for robots

Computers and related systems used in health care and in otherwise safeguarding health and safety: Deaths, injuries, psychological disabilities, misdiagnosis, mistreatment, incorrect billing—Sections 2.9.1 and 2.9.2; loss of privacy of personal information—Chapter 6

Computers and related systems whose use causes health problems: Physical harm (carpal-tunnel syndrome and other repetitive-strain injuries, radiation effects), computer-related stress, mental anguish—Section 2.9.2

Computers used in electrical power: Deaths, injuries, power outages, long-term health hazards including radiation effects in nuclear-power systems—Section 2.10

Computers that manage money: Fraud, violations of privacy; losses and disruptions of market facilities such as shutdown of stock exchanges and banks, both accidental and intentional—Sections 5.6 and 5.7

Computers that control elections: Accidentally wrong results and election frauds involving computer or external manipulation—Section 5.8

Computers that control jails and prisons: Technology-aided escape attempts and successes, mistakes that accidentally release inmates, failures in computer-controlled locks—Section 5.9

Computers used in law enforcement: False arrests and false imprisonments, failure to apprehend suspects—Section 6.5

Computers that enforce system and data integrity: Insertion of Trojan horses, viruses, denials of service, spoofing attacks, subversions — Chapter 5

Computers that protect data privacy: Invasions of privacy, such as undesired data accessibility or tracking of individuals — Chapter 6

Computers in general: Accidentally false results, delays, and losses of data—for example, Sections 2.11, 5.2, 5.5, and 7.2; victimization by intentional system misuses such as the malicious insertion of Trojan horses and bogus messages—Sections 5.1, 5.3, and 5.4

Computers that otherwise adversely affect people’s lives: Personal aggravation including being falsely declared dead, being billed incorrectly, and other nuisances—Section 6.4; adverse effects of mistaken identities—Section 6.5

1.4 Defensive Measures

Given these potential sources of risks and their consequent adverse effects, appropriate countermeasures are essential. Chapter 7 includes discussion of techniques for increasing reliability, safety, security, and other system properties, and protecting privacy. That chapter also addresses techniques for improving the system-development process, including system engineering and software engineering.

The reader who is technically inclined will find useful discussions throughout, and many pointers to further references—in various sections and in the summaries at the end of each chapter. Reliability and safety techniques are considered in Section 7.7. Security techniques are considered in Sections 3.8 and 6.3, and in several of the sections of Chapter 7 (particularly Section 7.9). Assurance of desired system properties in turn relies heavily on software engineering and system-development approaches considered in Sections 7.6 and 7.8. The reader who is primarily interested in the illustrative material may wish to skip over those sections.

1.5 Guide to Summary Tables

Chapters 5 and 6, and most sections of Chapter 2, conclude with tables that summarize the cases discussed therein. For each case or group of cases, the apparent causative factors are indicated. For each of the summary tables, the abbreviations and symbols used for the column headings and for the table entries are as defined in Table 1.1. In particular, the column headings in each of the summary tables relate to the most prevalent sources of problems itemized in Section 1.2. The summary-table entries exhibit the relevance of each causative factor. In particular, a image indicates a negative causative factor that contributed to the problem, and a • indicates the primary causative factor. When multiple cases are aggregated into a single row of a summary table, a image in parentheses indicates that some of those cases exhibit the column attribute. Occasionally when multiple events involving different factors contributed to a particular case, those events are identified by parenthesized numbers in the text, and those numbers are also referred to in the summary tables (for example, as in Table 2.2). In a few cases, positive factors helped to ameliorate a particular problem—for example, through detection and manual recovery; in those instances, the presence of a positive or restorative factor is denoted by a * in the column corresponding to the relevant factor (as in Table 2.2).

Table 1.1 Summary of table abbreviations

image

1.6 Summary of the Chapter

This introductory chapter provides a brief overview of the types of harmful causes and adverse effects that can arise in connection with the use of computers. It introduces the enormous variety of problems that we confront throughout this book.

Preliminary Challenges for the Reader

C1.1 Examine the sources of risks given in Section 1.2. Identify those sources that have affected your life. Describe how they have done so.

C1.2 Examine the illustrative list of adverse consequences given in Section 1.3. Identify those consequences that have affected your life. Describe them.

C1.3 Describe an incident in which your work with computer systems has been adversely affected, and analyze why that happened and what could have been done to avoid or mitigate the effects.

C1.4 (Essay question) What are your expectations of computer-communication technologies? What effects do you think these technologies might have on civilization, both in the short term and in the long term? What kinds of risks do you think might be acceptable, and under what circumstances? Is the technology making life better or worse? Be specific in your answers. (This challenge is a preliminary one. Your views may well change after you read the rest of the book, at which time you may wish to reassess your answers to this question.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.200.197