5.10 Program Validation and Testing

Complex systems need testing to ensure that they work as they are intended. But bugs can be subtle, particularly in embedded systems, where specialized hardware and real-time responsiveness make programming more challenging. Fortunately, there are many available techniques for software testing that can help us generate a comprehensive set of tests to ensure that our system works properly. We examine the role of validation in the overall design methodology in Section 9.6. In this section, we concentrate on nuts-and-bolts techniques for creating a good set of tests for a given program.

The first question we must ask ourselves is how much testing is enough. Clearly, we cannot test the program for every possible combination of inputs. Because we cannot implement an infinite number of tests, we naturally ask ourselves what a reasonable standard of thoroughness is. One of the major contributions of software testing is to provide us with standards of thoroughness that make sense. Following these standards does not guarantee that we will find all bugs. But by breaking the testing problem into subproblems and analyzing each subproblem, we can identify testing methods that provide reasonable amounts of testing while keeping the testing time within reasonable bounds.

We can use various combinations of two major types of testing strategies:

Black-box methods generate tests without looking at the internal structure of the program.

Clear-box (also known as white-box) methods generate tests based on the program structure.

In this section we cover both types of tests, which complement each other by exercising programs in very different ways.

5.10.1 Clear-Box Testing

The control/data flow graph extracted from a program’s source code is an important tool in developing clear-box tests for the program. To adequately test the program, we must exercise both its control and data operations.

In order to execute and evaluate these tests, we must be able to control variables in the program and observe the results of computations, much as in manufacturing testing. In general, we may need to modify the program to make it more testable. By adding new inputs and outputs, we can usually substantially reduce the effort required to find and execute the test. No matter what we are testing, we must accomplish the following three things in a test:

Provide the program with inputs that exercise the test we are interested in.

Execute the program to perform the test.

Examine the outputs to determine whether the test was successful.

Example 5.11 illustrates the importance of observability and controllability in software testing.

Example 5.11 Controlling and Observing Programs

Let’s first consider controllability by examining the following FIR filter with a limiter:

firout = 0.0; /* initialize filter output */

/* compute buff*c in bottom part of circular buffer */

for (j = curr, k = 0; j < N; j++, k++)

   firout += buff[j] * c[k];

/* compute buff*c in top part of circular buffer */

for (j = 0; j < curr; j++, k++)

   firout += buff[j] * c[k];

/* limit output value */

if (firout > 100.0) firout = 100.0;

if (firout < −100.0) firout = −100.0;

The above code computes the output of an FIR filter from a circular buffer of values and then limits the maximum filter output (much as an overloaded speaker will hit a range limit). If we want to test whether the limiting code works, we must be able to generate two out-of-range values for firout: positive and negative. To do that, we must fill the FIR filter’s circular buffer with N values in the proper range. Although there are many sets of values that will work, it will still take time for us to properly set up the filter output for each test.

This code also illustrates an observability problem. If we want to test the FIR filter itself, we look at the value of firout before the limiting code executes. We could use a debugger to set breakpoints in the code, but this is an awkward way to perform a large number of tests. If we want to test the FIR code independent of the limiting code, we would have to add a mechanism for observing firout independently.

Being able to perform this process for a large number of tests entails some amount of drudgery, but that drudgery can be alleviated with good program design that simplifies controllability and observability.

The next task is to determine the set of tests to be performed. We need to perform many different types of tests to be confident that we have identified a large fraction of the existing bugs. Even if we thoroughly test the program using one criterion, that criterion ignores other aspects of the program. Over the next few pages we will describe several very different criteria for program testing.

Execution paths

The most fundamental concept in clear-box testing is the path of execution through a program. Previously, we considered paths for performance analysis; we are now concerned with making sure that a path is covered and determining how to ensure that the path is in fact executed. We want to test the program by forcing the program to execute along chosen paths. We force the execution of a path by giving it inputs that cause it to take the appropriate branches. Execution of a path exercises both the control and data aspects of the program. The control is exercised as we take branches; both the computations leading up to the branch decision and other computations performed along the path exercise the data aspects.

Is it possible to execute every complete path in an arbitrary program? The answer is no, because the program may contain a while loop that is not guaranteed to terminate. The same is true for any program that operates on a continuous stream of data, because we cannot arbitrarily define the beginning and end of the data stream. If the program always terminates, then there are indeed a finite number of complete paths that can be enumerated from the path graph. This leads us to the next question: Does it make sense to exercise every path? The answer to this question is no for most programs, because the number of paths, especially for any program with a loop, is extremely large. However, the choice of an appropriate subset of paths to test requires some thought.

Example 5.12 illustrates the consequences of two different choices of testing strategies.

Example 5.12 Choosing the Paths to Test

We have at least two reasonable ways to choose a set of paths in a program to test:

execute every statement at least once;

execute every direction of a branch at least once.

These conditions are equivalent for structured programming languages without gotos, but are not the same for unstructured code. Most assembly language is unstructured, and state machines may be coded in high-level languages with gotos.

To understand the difference between statement and branch coverage, consider this CDFG:

image

We can execute every statement at least once by executing the program along two distinct paths. However, this leaves branch a out of the lower conditional uncovered. To ensure that we have executed along every edge in the CDFG, we must execute a third path through the program. This path does not test any new statements, but it does cause a to be exercised.

How do we choose a set of paths that adequately covers the program’s behavior? Intuition tells us that a relatively small number of paths should be able to cover most practical programs. Graph theory helps us get a quantitative handle on the different paths required. In an undirected graph, we can form any path through the graph from combinations of basis paths. (Unfortunately, this property does not strictly hold for directed graphs such as CDFGs, but this formulation still helps us understand the nature of selecting a set of covering paths through a program.) The term “basis set” comes from linear algebra. Figure 5.25 shows how to evaluate the basis set of a graph. The graph is represented as an incidence matrix. Each row and column represents a node; a 1 is entered for each node pair connected by an edge. We can use standard linear algebra techniques to identify the basis set of the graph. Each vector in the basis set represents a primitive path. We can form new paths by adding the vectors modulo 2. Generally, there is more than one basis set for a graph.

image

Figure 5.25 The matrix representation of a graph and its basis set.

The basis set property provides a metric for test coverage. If we cover all the basis paths, we can consider the control flow adequately covered. Although the basis set measure is not entirely accurate because the directed edges of the CDFG may make some combinations of paths infeasible, it does provide a reasonable and justifiable measure of test coverage.

A simple measure, cyclomatic complexity[McC76], allows us to measure the control complexity of a program. Cyclomatic complexity is an upper bound on the size of the basis set. If e is the number of edges in the flow graph, n the number of nodes, and p the number of components in the graph, then the cyclomatic complexity is given by

image (Eq. 5.1)

For a structured program, M can be computed by counting the number of binary decisions in the flow graph and adding 1. If the CDFG has higher-order branch nodes, add b − 1 for each b-way branch. In the example of Figure 5.26, the cyclomatic complexity evaluates to 4. Because there are actually only three distinct paths in the graph, cyclomatic complexity in this case is an overly conservative bound.

image

Figure 5.26 Cyclomatic complexity.

Another way of looking at control flow–oriented testing is to analyze the conditions that control the conditional statements. Consider the following if statement:

if ((a == b) || (c >= d)) { … }

This complex condition can be exercised in several different ways. If we want to truly exercise the paths through this condition, it is prudent to exercise the conditional’s elements in ways related to their own structure, not just the structure of the paths through them. A simple condition testing strategy is known as branch testing[Mye79]. This strategy requires the true and false branches of a conditional and every simple condition in the conditional’s expression to be tested at least once.

Example 5.13 illustrates branch testing.

Example 5.13 Condition Testing with the Branch Testing Strategy

Assume that the code below is what we meant to write.

if (a || (b >= c)) { printf("OK "); }

The code that we mistakenly wrote instead follows:

if (a && (b >= c)) { printf("OK "); }

If we apply branch testing to the code we wrote, one of the tests will use these values: a = 0, b = 3, c = 2 (making a false and b >= c true). In this case, the code should print the OK term [0 || (3 >= 2) is true] but instead doesn’t print [0 && (3 >= 2) evaluates to false]. That test picks up the error.

Let’s consider another more subtle error that is nonetheless all too common in C. The code we meant to write follows:

if ((x == good_pointer) && (x->field1 == 3)) { printf("got the value "); }

Here is the bad code we actually wrote:

if ((x = good_pointer) && (x->field1 == 3)) { printf("got the value "); }

The problem here is that we typed = rather than ==, creating an assignment rather than a test. The code x = good_pointer first assigns the value good_pointer to x and then, because assignments are also expressions in C, returns good_pointer as the result of evaluating this expression.

If we apply the principles of branch testing, one of the tests we want to use will contain x != good_pointer and x->field1 == 3. Whether this test catches the error depends on the state of the record pointed to by good_pointer. If it is equal to 3 at the time of the test, the message will be printed erroneously. Although this test is not guaranteed to uncover the bug, it has a reasonable chance of success. One of the reasons to use many different types of tests is to maximize the chance that supposedly unrelated elements will cooperate to reveal the error in a particular situation.

Another more sophisticated strategy for testing conditionals is known as domain testing [How82], illustrated in Figure 5.27. Domain testing concentrates on linear inequalities. In the figure, the inequality the program should use for the test is j < = i + 1. We test the inequality with three test points—two on the boundary of the valid region and a third outside the region but between the i values of the other two points. When we make some common mistakes in typing the inequality, these three tests are sufficient to uncover them, as shown in the figure.

image

Figure 5.27 Domain testing for a pair of values.

A potential problem with path coverage is that the paths chosen to cover the CDFG may not have any important relationship to the program’s function. Another testing strategy known as data flow testing makes use of def-use analysis (short for definition-use analysis). It selects paths that have some relationship to the program’s function.

The terms def and use come from compilers, which use def-use analysis for optimization [Aho06]. A variable’s value is defined when an assignment is made to the variable; it is used when it appears on the right side of an assignment (sometimes called a C-use for computation use) or in a conditional expression (sometimes called P-use for predicate use). A def-use pair is a definition of a variable’s value and a use of that value. Figure 5.28 shows a code fragment and all the def-use pairs for the first assignment to a. Def-use analysis can be performed on a program using iterative algorithms. Data flow testing chooses tests that exercise chosen def-use pairs. The test first causes a certain value to be assigned at the definition and then observes the result at the use point to be sure that the desired value arrived there. Frankl and Weyuker [Fra88] have defined criteria for choosing which def-use pairs to exercise to satisfy a well-behaved adequacy criterion.

image

Figure 5.28 Definitions and uses of variables.

Testing loops

We can write some specialized tests for loops. Because loops are common and often perform important steps in the program, it is worth developing loop-centric testing methods. If the number of iterations is fixed, then testing is relatively simple. However, many loops have bounds that are executed at run time.

Consider first the case of a single loop:

for (i = 0; i < terminate(); i++)

   proc(i,array);

It would be too expensive to evaluate the above loop for all possible termination conditions. However, there are several important cases that we should try at a minimum:

1. Skipping the loop entirely [if possible, such as when terminate() returns 0 on its first call].

2. One-loop iteration.

3. Two-loop iterations.

4. If there is an upper bound n on the number of loop iterations (which may come from the maximum size of an array), a value that is significantly below that maximum number of iterations.

5. Tests near the upper bound on the number of loop iterations, that is, n − 1, n, and n + 1.

We can also have nested loops like this:

for (i = 0; i < terminate1(); i++)

   for (j = 0; j < terminate2(); j++)

     for (k = 0; k < terminate3(); k++)

       proc(i,j,k,array);

There are many possible strategies for testing nested loops. One thing to keep in mind is which loops have fixed versus variable numbers of iterations. Beizer [Bei90] suggests an inside-out strategy for testing loops with multiple variable iteration bounds. First, concentrate on testing the innermost loop as above—the outer loops should be controlled to their minimum numbers of iterations. After the inner loop has been thoroughly tested, the next outer loop can be tested more thoroughly, with the inner loop executing a typical number of iterations. This strategy can be repeated until the entire loop nest has been tested. Clearly, nested loops can require a large number of tests. It may be worthwhile to insert testing code to allow greater control over the loop nest for testing.

5.10.2 Black-Box Testing

Black-box tests are generated without knowledge of the code being tested. When used alone, black-box tests have a low probability of finding all the bugs in a program. But when used in conjunction with clear-box tests they help provide a well-rounded test set, because black-box tests are likely to uncover errors that are unlikely to be found by tests extracted from the code structure. Black-box tests can really work. For instance, when asked to test an instrument whose front panel was run by a microcontroller, one acquaintance of the author used his hand to depress all the buttons simultaneously. The front panel immediately locked up. This situation could occur in practice if the instrument were placed face-down on a table, but discovery of this bug would be very unlikely via clear-box tests.

One important technique is to take tests directly from the specification for the code under design. The specification should state which outputs are expected for certain inputs. Tests should be created that provide specified outputs and evaluate whether the results also satisfy the inputs.

We can’t test every possible input combination, but some rules of thumb help us select reasonable sets of inputs. When an input can range across a set of values, it is a very good idea to test at the ends of the range. For example, if an input must be between 1 and 10, 0, 1, 10, and 11 are all important values to test. We should be sure to consider tests both within and outside the range, such as, testing values within the range and outside the range. We may want to consider tests well outside the valid range as well as boundary-condition tests.

Random tests

Random tests form one category of black-box test. Random values are generated with a given distribution. The expected values are computed independently of the system, and then the test inputs are applied. A large number of tests must be applied for the results to be statistically significant, but the tests are easy to generate.

Another scenario is to test certain types of data values. For example, integer-valued inputs can be generated at interesting values such as 0, 1, and values near the maximum end of the data range. Illegal values can be tested as well.

Regression tests

Regression tests form an extremely important category of tests. When tests are created during earlier stages in the system design or for previous versions of the system, those tests should be saved to apply to the later versions of the system. Clearly, unless the system specification changed, the new system should be able to pass old tests. In some cases old bugs can creep back into systems, such as when an old version of a software module is inadvertently installed. In other cases regression tests simply exercise the code in different ways than would be done for the current version of the code and therefore possibly exercise different bugs.

Numerical accuracy

Some embedded systems, particularly digital signal processing systems, lend themselves to numerical analysis. Signal processing algorithms are frequently implemented with limited-range arithmetic to save hardware costs. Aggressive data sets can be generated to stress the numerical accuracy of the system. These tests can often be generated from the original formulas without reference to the source code.

5.10.3 Evaluating Functional Tests

How much testing is enough? Horgan and Mathur [Hor96] evaluated the coverage of two well-known programs, TeX and awk. They used functional tests for these programs that had been developed over several years of extensive testing. Upon applying those functional tests to the programs, they obtained the code coverage statistics shown in Figure 5.29. The columns refer to various types of test coverage: block refers to basic blocks, decision to conditionals, P-use to a use of a variable in a predicate (decision), and C-use to variable use in a nonpredicate computation. These results are at least suggestive that functional testing does not fully exercise the code and that techniques that explicitly generate tests for various pieces of code are necessary to obtain adequate levels of code coverage.

image

Figure 5.29 Code coverage of functional tests for TeX and awk (after Horgan and Mathur [Hor96]).

Methodological techniques are important for understanding the quality of your tests. For example, if you keep track of the number of bugs tested each day, the data you collect over time should show you some trends on the number of errors per page of code to expect on the average, how many bugs are caught by certain kinds of tests, and so on. We address methodological approaches to quality control in more detail in Chapter 7.

One interesting method for analyzing the coverage of your tests is error injection. First, take your existing code and add bugs to it, keeping track of where the bugs were added. Then run your existing tests on the modified program. By counting the number of added bugs your tests found, you can get an idea of how effective the tests are in uncovering the bugs you haven’t yet found. This method assumes that you can deliberately inject bugs that are of similar varieties to those created naturally by programming errors. If the bugs are too easy or too difficult to find or simply require different types of tests, then bug injection’s results will not be relevant. Of course, it is essential that you finally use the correct code, not the code with added bugs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.201.26