This chapter introduces the concept of abstract data types and explains how they are used to improve software reliability while reducing the cost of development and maintenance. An example is presented for a mixed C and assembly implementation of an ADT to count word frequencies. The chapter ends with an ethical case study, highlighting the importance of structured design techniques.
Abstract data types (ADTs); Assembly language; Word frequency counts; Therac-25; Sorting; Indexing; Performance
An Abstract Data Type (ADT) is composed of data and the operations that work on that data. The ADT is one of the cornerstones of structured programming. Proper use of ADTs has many benefits. Most importantly, abstract data types help to support information hiding. An ADT hides information by encapsulating the information into a module or other construct, and providing a well-defined interface. The interface typically consists of the names of data types provided by the ADT and a set of subroutine definitions, or function prototypes, for operating on the data types. The implementation of the ADT is hidden from the client code that uses the ADT.
A common use of information hiding is to hide the physical storage layout for data so that if it is changed, the change is restricted to a small subset of the total program. For example, if a three-dimensional point is represented in a program with three floating point scalar variables and later, the representation is changed to a single array variable of size three, a module designed with information hiding in mind would protect the remainder of the program from such a change.
Information hiding reduces software development risk by shifting the code's dependency on an uncertain implementation onto a well-defined interface. Clients of the interface perform operations purely through the interface, which does not change. If the implementation changes, the client code does not have to change.
Encapsulating software and data structures behind an interface allows the construction of objects that mimic the behavior and interactions of objects in the real world. For example, a simple digital alarm clock is a real-world object that most people can use and understand. They can understand what the alarm clock does, and how to use it through the provided interface (buttons and display) without having to understand every part inside of the clock. If the internal circuitry of the clock were to be replaced with a different implementation, people could continue to use it in the same way, provided that the interface did not change.
As with all other structured programming concepts, ADTs can be implemented in assembly language. In fact, most high level compilers convert structured programming code into assembly during compilation. All that is required is that the programmer define the data structure(s), and the set of operations that can be used on the data. Listing 6.1 gives an example of an ADT interface in C. The type is not fully defined in the interface. This prevents client software from accessing the internal structure of the image data type. Therefore, programmers using the ADT can modify images only by using the provided functions. Other structured programming and object-oriented programming languages such as C++, Java, Pascal, and Modula 2 provide similar protection for data structures so that client code can access the data structure only through the provided interface. Note that only the definition is exposed, indicating to client programs that the red, green, and blue components of a pixel must be a number between 0 and 255. In C, as with other structured programming languages, the implementation of the subroutines can also be hidden by placing them in separate compilation modules. Only the ADT implementation code will have access to the internal structure of the data type.
Assembly language does not have the ability to define a data structure as such, but it does provide the mechanisms needed to specify the location of each field with respect to the beginning of a data structure, as well as the overall size of the data structure. With a little thought and effort, it is possible to implement ADTs in Assembly language.
Listing 6.2 shows the private implementation of the data type, which is included by the C files which implement the abstract data type. A pixel is defined as a C struct with one byte each for the red, blue, and green color components. Since the largest item in the struct is a byte, the C compiler will create a structure that is exactly three bytes long, without any extra bytes for alignment. The image struct contains three integer values, so it will occupy twelve bytes. Listing 6.3 shows how the data structures from the previous listings can be defined in assembly language. With those definitions, any of the functions declared in Listing 6.1 can be written in assembly language.
Counting the frequency of words in written text has several uses. In digital forensics, it can be used to provide evidence as to the author of written communications. Different people have different vocabularies, and use words with differing frequency. Word counts can also be used to classify documents by type. Scientific articles from different fields contain words specific to the field, and historical novels will differ from western novels in word frequency.
Listing 6.4 shows the main function for a simple C program which reads a text file and creates a list of all of the words contained in a file, along with their frequency of occurrence. The program has been divided into two parts: the main program, and an ADT. The ADT is used to keep track the words and their frequencies, and to print a table of word frequencies.
The interface for the ADT is shown in Listing 6.5. There are several ways that the ADT could be implemented. Note that the interface given in the header file does not show the internal fields of the word list data type. Thus, any file which includes this header are allowed to declare pointers to data types, but cannot access or modify any internal fields. The list of words could be stored in an array, a linked list, a binary tree, or some other data structure. Also, the subroutines could be implemented in C or in some other language, including assembly. Listing 6.6 shows an implementation in C using a linked list. Note that the function for printing the word frequency list in numerical order has not been implemented. It will be written in assembly language. Since the program is split into multiple files, it is a good idea to use the utility to build the executable program. A basic makefile is shown in Listing 6.7.
Suppose we wish to implement one of the functions from Listing 6.6 in AArch64 assembly language. We would
The header file and the main program file would not require any changes. The header file provides function prototypes that the C compiler uses to determine how parameters should be passed to the functions. As long as our new assembly function conforms to its C header definition, the program will work correctly.
The linked list is created in alphabetical order, but the function is required to print it sorted in reverse order of number of occurrences. There are several ways in which this could be accomplished, with varying levels of efficiency. Possible approaches include, but are not limited to:
There are many other possibilities for re-ordering the list. Regardless of which method is chosen, the main program and the interface (header file) need not be changed. Different implementations of the sorting function can be substituted without affecting any other code, because the implementation of the ADT is hidden from the code that uses it. This separation of inteface and implementation provides many benefits.
The wl_print_numerical function can be implemented in assembly as shown in Listing 6.8. The function operates by re-ordering the linked list using an insertion sort as described above. Listing 6.9 shows the change that must be made to the makefile. Now, when make is run, it compiles the two C files and the assembly file into object files, then links them all together. The C implementation of in must be deleted or commented out or the linker will emit an error indicating that it found two versions of .
The word frequency counter, as previously implemented, takes several minutes to count the frequency of words in this textbook on a Raspberry Pi. Most of the time is spent building the list of words, and re-sorting the list in order of word frequency. Most of the time for both of these operations is spent in searching for the word in the list before incrementing its count or inserting it in the list. There are much more efficient ways to build ordered lists of data.
Since the code is well modularized using an ADT, the internal mechanism of the list can be modified without affecting the main program. A major improvement can be made by changing the data structure from a linked list to a binary tree. Fig. 6.1 shows an example binary tree storing word frequency counts. The time required to insert into a linked list is , but the time required to insert into a binary tree is . To give some perspective, the author's manuscript for this textbook contains about 125,000 words. Since , we would expect the linked list implementation to require about times as long as a binary tree implementation to process the author's manuscript for this textbook. In reality, there is some overhead to the binary tree implementation. Even with the extra overhead, we should see a significant speedup. Listing 6.10 shows the C implementation using a balanced binary tree instead of a linked list.
With the tree implementation, could build a new tree, sorted on the word frequency counts. However, it may be more efficient to build a separate index, and sort the index by word frequency counts. The assembly code will allocate an array of pointers, and set each pointer to one of the nodes in the tree, as shown in Fig. 6.2. Then, it will use a quicksort to rearrange the pointers into descending order by word frequency count, as shown in Fig. 6.3. This implementation is shown in Listing 6.11.
The tree-based implementation gets most of its speed improvement through using two algorithms to replace algorithms. These examples show how a small part of a program can be implemented in assembly language, and how to access C data structures from assembly language. The functions could just as easily have been written in C rather than assembly, without greatly affecting performance. Later chapters will show examples where the assembly implementation does have significantly better performance than the C implementation.
The Therac-25 was a device designed for radiation treatment of cancer. It was produced by Atomic Energy of Canada Limited (AECL), which had previously produced the Therac-6 and Therac-20 units in partnership with CGR of France. It was capable of treating tumors close to the skin surface using electron beam therapy, but could also be configured for Megavolt X-ray therapy to treat deeper tumors. The X-ray therapy required the use of a tungsten radiation shield to limit the area of the body that was exposed to the potentially lethal radiation produced by the device.
The Therac-25 used a double pass accelerator, which provided more power, in a smaller space, at less cost, compared to its predecessors. The second major innovation was that computer control was a central part of the design, rather than an add-on component as in its predecessors. Most of the hardware safety interlocks that were integral to the designs of the Therac-6 and Therac-20, were seen as unnecessary, because to software would perform those functions. Computer control was intended to allow operators to set up the machine more quickly, allowing them to spend more time communicating with patients and to treat more patients per day. It was also seen as a way to reduce production costs by relying on software, rather than hardware, safety interlocks.
There were design issues with both the software and the hardware. Although this machine was built with the goal of saving lives, between 1985 and 1986, three deaths and other injuries were attributed to the hardware and software design of this machine. Death due to radiation exposure is usually slow and painful, and the problem was not identified until the damage had been done.
AECL was required to obtain US Food and Drug Administration (FDA) approval before releasing the Therac-25 to the US market. They obtained approval quickly by declaring “pre-market equivalence,” effectively claiming that the new machine was not significantly different from its predecessors. This practice was common in 1984, but was overly optimistic, considering that most of the safety features had been changed from hardware to software implementations. With FDA approval, AECL made the Therac-25 commercially available and performed a Fault Tree Analysis to evaluate the safety of the device.
Fault Tree Analysis, as its name implies, requires building a tree to describe every possible fault, and assign probabilities to those faults. After building the tree, the probabilities of hazards, such as overdose, can be calculated. Unfortunately, the engineers assumed that the software (much of which was re-used from the previous Therac models) would operate correctly. This turned out not to be the case, because the hardware interlocks present in the previous models had hidden some of the software faults. The analysts did consider some possible computer faults, such as an error being caused by cosmic rays, but assigned extremely low probabilities to those faults. As a result, the assessment was very inaccurate.
When the first report of an overdose was reported to AECL in 1985, they sent an engineer to the site to investigate. They also filed a report with the FDA and the Canadian Radiation Protection Board (CRPB). AECL also notified all users of the fact that there had been a report and recommended that operators should visually confirm hardware settings before each treatment. The AECL engineers were unable to reproduce the fault, but suspected that it was due to the design and placement of a microswitch. They redesigned the microswitch and modified all of the machines that had been deployed. They also retracted their recommendation that operators should visually confirm hardware settings before each treatment.
Later that year, a second incident occurred. In this case, there is no evidence that AECL took any action. In January of 1986, AECL received another incident report. An employee at AECL responded by denying that the Therac-25 was at fault, and stated that no other similar incidents had been reported. Another incident occurred in March of that year. AECL sent an engineer to investigate. The engineer was unable to determine the cause, and suggested that it was due to an electrical problem, which may have caused an electrical shock. An independent engineering firm was called to examine the machine and reported that it was very unlikely that the machine could have delivered an electrical shock to the patient. In April of 1986, another incident was reported. In this case, the AECL engineers, working with the medical physicist at the hospital, were able to reproduce the sequence of events that lead to the overdose.
As required by law, AECL filed a report with the FDA. The FDA responded by declaring the Therac-25 defective. AECL was ordered to notify all of the sites where the Therac-25 was in use, investigate the problem, and file a corrective action plan. AECL notified all sites, and recommended removing certain keys from the keyboard on the machines. The FDA responded by requiring them to send another notification with more information about the defect and the consequent hazards. Later in 1986, AECL filed a revised corrective action plan.
Another overdose occurred in January 1987, and was attributed to a different software fault. In February, the FDA and CRPB, both ordered that all Therac-25 unites be shut down, pending effective and permanent modifications. AECL spent six months developing a new corrective action plan, which included a major overhaul of the software, the addition of mechanical safety interlocks, and other safety-related modifications.
The Therac-25 was controlled by a DEC PDP-11 computer. The PDP-11 was the most popular minicomputer ever produced. Around 600,000 were produced between 1970 and 1990 and used for a variety of purposes, including embedded systems, education, and general data processing. It was a 16-bit computer and was far less powerful than a Raspberry Pi. The Therac-25 computer was programmed in assembly language by one programmer and the source code was not documented. Documentation for the hardware components was written in French. After the faults were discovered, a commission concluded that the primary problems with the Therac-25 were attributable to poor software design practices, and not due to any one of several specific coding errors. This is probably the best known case where a poor overall software design, and insufficient testing, led to loss of life.
The worst problems in the design and engineering of the machine were:
The operator interface consisted of a keyboard and text-mode monitor, which was common in the early 1980s. The interface had a data entry area in the middle of the screen and a command line at the bottom. The operator was required to enter parameters in the data entry area, then move the cursor to the command line to initiate treatment. When the operator moved the cursor to the command line, internal variables were updated and a flag variable was set to indicate that data entry was complete. That flag was cleared when a command was entered on the command line. If the operator moved the cursor back to the data entry area without entering a command, then the flag was not cleared, and any subsequent changes to the data entry area did not affect the internal variables.
A global variable was used to indicate that the magnets were currently being adjusted. This variable was modified by two functions, and did not always contain the correct value. Adjusting the magnets required about eight seconds, and the flag was correct for only a small period at the beginning of this time period.
Due to the errors in the design and implementation of the software, the following sequence of events could result in the machine causing injury to, or even death of, the patient:
If the above sequence occurred in less than eight seconds, then the operator screen could indicate that the machine was in low power mode, although it was actually set in high power mode. During a final check before initiating the beam, the software would find that the magnets were set for high power mode but the operator setting was for low power mode. It displayed a numeric error code and prevented the machine from starting. The operator could clear the error code by resetting the computer (which only required one key to be pressed on the keyboard). This caused the tungsten shield to withdraw but left the machine in X-ray mode. When the operator entered the command to start the beam, the machine could be in high power mode without having the tungsten shield in place. X-rays were applied to the unprotected patient.
It took some time for this critical flaw to appear. The failure only occurred when the operator initially made a one-keystroke mistake in entering the prescription data, moved to the command area, and then corrected the mistake within eight seconds. Initially, operators were slow to enter data, and spent a lot of time making sure that the prescription was correct before initiating treatment. As they became more familiar with the machine, they were able to enter data, and correct mistakes more quickly. Eventually, operators became familiar enough with the machine that they could enter data, make a correction, and return to the command area within the critical eight-second window. Also, the operators became familiar with the machine reporting numeric error codes, without any indication of the severity of the code. The operators were given a table of codes and their meanings. The code reported was “no dose” and indicated “treatment pause.” There is no reason why the operator should consider that to be a serious problem; they had become accustomed to frequent malfunctions that did not have any consequences to the patient.
Although the code was written in assembly language, that fact was not cited as an important factor. The fundamental problems were poor software design and over-confidence. The re-use of code in an application for which it was not initially designed also may have contributed to the system flaws. A proper design using established software design principles, including structured programming and abstract data types, would almost certainly have avoided these fatalities.
The abstract data type is a structured programming concept which contributes to software reliability, eases maintenance, and allows for major revisions to be performed in as safe way. Many high-level languages enforce, or at least facilitate, the use of ADTs. Assembly language does not. However, the ethical assembly language programmer will give the extra effort to write code that conforms to the standards of structured programming, using abstract data types to help ensure safety, reliability, and maintainability.
ADTs also facilitate the implementation of software modules in more than one language. The interface specifies the components of the ADT, but not the implementation. The implementation can be in any language. As long as assembly programmers and compiler authors generate code that conforms to a well-known standard, their code can be linked to code written in other languages.
Poor coding practices and poor design can lead to dire consequences, including loss of life. It is the responsibility of the programmer, regardless of the language used, to make ethical decisions in the design and implementation of software. Above all, the programmer must be aware of the possible consequences of the decisions they make.
3.145.173.112