Fault Distribution

Large enterprise systems typically have a long lifespan, are built by many programmers, and are developed and released to the field at regular intervals. Anecdotal reports have stated that after a large system has gone through a few early releases, bugs in later versions tend to be concentrated in relatively small sections of the code. The bug distribution is frequently described as a Pareto distribution, with 80% of the bugs being located in just 20% of the code entities such as files or methods. This situation can be very helpful for system testers and debuggers if they can identify just which files fall into the 20% that contain problems because it would allow them to focus their Quality Assurance efforts such as testing, inspections, and debugging most effectively.

Working at AT&T, we have access to quite a few large systems with extended lifetimes, so we started a project to:

  • Identify the parts of the code most likely to have bugs prior to the system testing phase

  • Design and implement a programming environment tool that identifies the most bug-prone parts of the system and presents the information to developers and testers

The concept of the “most bug-prone parts of the system” is meaningful only if faults really are highly concentrated in certain parts of the system, so in order for these to be feasible goals, we first need to provide evidence that bugs are indeed distributed throughout the code with a highly skewed Pareto-like distribution.

We have examined six large systems that are used continuously in various parts of AT&T’s operations, shown in Table 9-1. Their purposes include inventory control, provisioning, maintenance support, and automated voice response processing. These are real industrial systems, with bugs that actually occurred and were identified primarily during in-house testing, but sometimes by customers in the field.

Three of the systems have been developed and maintained by internal software development organizations, and the other three by an outside company. Their sizes range from 300,000 to over a half million lines of source code, and their lifetimes range from 2 years to almost 10 years. Each system includes code in a variety of different programming languages, sometimes as few as 4, but at least one system has code written in over 50 languages. With one exception (described later), the life history of these systems follows a disciplined pattern of regularly spaced versions, which are usually released to users approximately every three months.

The evidence collected from these systems overwhelmingly supports the Pareto hypothesis about bug distribution. A preliminary study on the first system we examined showed its bugs to be concentrated in fewer than 20% of the system’s files for releases 2–9, and fewer than 10% for every release past the ninth.

Table 9-1. Overview of systems

System

Total releases

Years in the field

Avg files per release

Avg KLOC per release

Avg bugs per release

Avg % buggy files

% LOC in buggy files

Inventory

17

4

1318

363

301

10.4

28.7

Provisioning

9

2

2178

416

34

1.3

5.7

Voice Response

9

2.25

1341

228

165

10.1

16.9

Maintenance Support A

35

9

550

333

44

4.8

17.4

Maintenance Support B

35

9

1237

300

36

1.9

13.7

Maintenance Support C

27

7

437

219

42

4.8

14.3

This convinced us that it made sense to proceed with developing a fault prediction model. Each system we’ve studied subsequently has provided additional evidence that this Pareto distribution occurs, with faults concentrated in surprisingly small portions of the code. For two of the systems, the average percent of buggy files over all releases was less than 10.5%. For two other systems, faults were found in fewer than 5% of the files. And for the two remaining systems, fewer than 2% of the files contained any faults. Stated another way, for the six systems studied, roughly 90% or more of the files had no bugs detected either during pre-release system tests or in the field. Although the buggy files tended to be larger than non-buggy files, for five of the six systems they nevertheless contained less than 18% of the total lines of code in the system.

Table 9-1 has the system-specific values for bug concentration, as well as other key figures about each system. The second and third columns indicate the system’s age in terms of the number of releases and the number of years in the field. The remaining columns are all averages calculated over all the releases for each system. They show the size of the system in terms of the number of files and the number of thousands of lines of code (KLOC), the average number of bugs, the percent of the system’s files that have one or more bugs, and the percent of the system’s code that is included in these buggy files.

Usually, as a system’s lifetime increases, the number of files typically increases as new functionality is added. Therefore, the average number of files and the average number of lines of code shown in Table 9-1 are typically smaller than those values for releases late in the system’s life. Table 9-2 shows the size of the latest release that we studied for each system, measured in terms of the number of files and the number of lines of code, along with our prediction findings for each of the six systems.

When systems are developed by large organizations with many programmers and testers, it’s crucial to have systematic defect reporting, change management, and version control. The systems that we’ve studied use a single tool that fulfills all those functions. Changes that are recorded by the tool are documented in a modification request, or MR, which starts out as a description of a desired change and is updated with a description of the actual change that is eventually made. An MR can request a change to any aspect of the system, including to the system requirements, to a design document, to the code, or to user documentation.

Table 9-2. Percentage of faults in top 20% of files for previously studied systems

System

Final release files

Final release KLOC

% faults in top 20% files

Inventory

1950

538

83%

Provisioning

2308

438

83%

Voice Response

1888

329

75%

Maintenance Support A

668

442

81%

Maintenance Support B

1413

384

93%

Maintenance Support C

584

329

76%

The submitter of an MR could be, for example, a system designer who wants to change the way some function performs, a programmer who has discovered a more efficient algorithm for part of his code, or a tester who has discovered a flaw in the program’s behavior. The actual change to the system might take place immediately and be performed by the MR submitter (in the case of the programmer), or it might be done hours or days later by a second person who is responsible for the code but did not submit the original MR. This is the typical scenario when an MR was initiated by a tester. The MR documents all aspects of the change, including the dates and phases of the development process when it is submitted and when it is implemented, the identity of the submitter and the implementor, attribute tags that characterize the MR, and any written description of the change that is provided by the submitter or the implementor.

The change itself is also recorded as part of the MR, and provides the information used by the version control component of the tool to create a build of the system at any point in its lifetime. The MR information is stored in a database that gives our fault prediction tool all the information we need to analyze past releases and predict future faults.

Although all our systems use the same underlying tool to record MRs, they may use it in different ways. The most significant difference is the stage of the development process when a project starts requiring changes to be made formally with MRs. One common practice is for MRs to be required only after the system has reached the stage of system testing, so that the MR database contains no information about changes made or defects found in any phases prior to system test. This practice was followed by four of the six systems we studied. For those systems, our models predict faults that are expected to be detected in system testing or field operation, as the models are based on past faults from those phases.

The Inventory System recorded defects found and corrected for all development phases starting with unit testing, going through system testing and release. Roughly 80% of its identified faults were reported during unit and integration testing, preceding the system test phase, and inflating the average fault count relative to the other systems. This explains why the Inventory System has substantially more faults, on average, than any of the other systems. It is not necessarily an indication that it is more problematic than the others.

The Voice Response system was a special case that will be discussed in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.120.187