Chapter 7. The Rayleigh Model

Having discussed defect removal effectiveness and the phase-based defect removal model, this chapter discusses a formal model of software reliability: the Rayleigh model. The Rayleigh model is a parametric model in the sense that it is based on a specific statistical distribution. When the parameters of the statistical distribution are estimated based on the data from a software project, projections about the defect rate of the project can be made based on the model.

Reliability Models

Software reliability models are used to assess a software product’s reliability or to estimate the number of latent defects when it is available to the customers. Such an estimate is important for two reasons: (1) as an objective statement of the quality of the product and (2) for resource planning for the software maintenance phase. The criterion variable under study is the number of defects (or defect rate normalized to lines of code or function points) in specified time intervals (weeks, months, etc.), or the time between failures. Reliability models can be broadly classified into two categories: static models and dynamic models (Conte et al., 1986). A static model uses other attributes of the project or program modules to estimate the number of defects in the software. A dynamic model, usually based on statistical distributions, uses the current development defect patterns to estimate end-product reliability. A static model of software quality estimation has the following general form:

Reliability Models

where the dependent variable y is the defect rate or the number of defects, and the independent variables xi are the attributes of the product, the project, or the process through which the product is developed. They could be size, complexity, skill level, count of decisions, and other meaningful measurements. The error term is e (because models don’t completely explain the behavior of the dependent variable).

Estimated coefficients of the independent variables in the formula are based on data from previous products. For the current product or project, the values of the independent variables are measured, then plugged into the formula to derive estimates of the dependent variable—the product defect rate or number of defects.

Static models are static in the sense that the estimated coefficients of their parameters are based on a number of previous projects. The product or project of interest is treated as an additional observation in the same population of previous projects. In contrast, the parameters of the dynamic models are estimated based on multiple data points gathered to date from the product of interest; therefore, the resulting model is specific to the product for which the projection of reliability is attempted.

Observation and experience shows that static models are generally less superior than dynamic models when the unit of analysis is at the product level and the purpose is to estimate product-level reliability. Such modeling is better for hypothesis testing (to show that certain project attributes are related to better quality or reliability) than for estimation of reliability. When the unit of analysis is much more granular, such as at the program module level, the static models can be powerful—not for product-level reliability estimates, but for providing clues to software engineers on how to improve the quality of their design and implementation. The complexity metrics and models are good examples of this type of modeling, and in Chapter 11 we discuss this topic in more detail.

Dynamic software reliability models, in turn, can be classified into two categories: those that model the entire development process and those that model the back-end testing phase. The former is represented by the Rayleigh model. The latter is represented by the exponential model and other reliability growth models, which are the subject of Chapter 8. A common denominator of dynamic models is that they are expressed as a function of time in development or its logical equivalent (such as development phase).

The Rayleigh Model

The Rayleigh model is a member of the family of the Weibull distribution. The Weibull distribution has been used for decades in various fields of engineering for reliability analysis, ranging from the fatigue life of deep-groove ball bearings to electron tube failures and the overflow incidence of rivers. It is one of the three known extreme-value distributions (Tobias, 1986). One of its marked characteristics is that the tail of its probability density function approaches zero asymptotically, but never reaches it. Its cumulative distribution function (CDF) and probability density function (PDF) are:

The Rayleigh Model

where m is the shape parameter, c is the scale parameter, and t is time. When applied to software, the PDF often means the defect density (rate) over time or the defect arrival pattern and the CDF means the cumulative defect arrival pattern.

Figure 7.1 shows several Weibull probability density curves with varying values for the shape parameter m. For reliability applications in an engineering field, the choice of a specific model is not arbitrary. The underlying assumptions must be considered and the model must be supported by empirical data. Of the Weibull family, the two models that have been applied in software reliability are the models with the shape parameter value m = 2 and m = 1.

Weibull Probability Density

Figure 7.1. Weibull Probability Density

The Rayleigh model is a special case of the Weibull distribution when m = 2. Its CDF and PDF are:

Weibull Probability Density

The Rayleigh PDF first increases to a peak and then decreases at a decelerating rate. The c parameter is a function of tm, the time at which the curve reaches its peak. By taking the derivative of f(t) with respect to t, setting it to zero and solving the equation, tm can be obtained.

Weibull Probability Density

After tm is estimated, the shape of the entire curve can be determined. The area below the curve up to tm is 39.35% of the total area.

The preceding formulas represent a standard distribution; specifically the total area under the PDF curve is 1. In actual applications, a constant K is multiplied to the formulas (K is the total number of defects or the total cumulative defect rate). If we also substitute

Weibull Probability Density

in the formulas, we get the following. To specify a model from a set of data points, K and tm are the parameters that need to be estimated.

Weibull Probability Density
Weibull Probability Density

It has been empirically well established that software projects follow a lifecycle pattern described by the Rayleigh density curve (Norden, 1963; Putnam, 1978). Early applications of the model in software were mainly for staffing estimation over time for the life cycle of software projects. More recent work demonstrated that the defect removal pattern of software projects also follows the Rayleigh pattern.

In 1982 Trachtenberg (1982) examined the month-by-month error histories of software projects and found that the composite error pattern of those projects resembled a Rayleigh-like curve. In 1984 Gaffney of the IBM Federal Systems Division reported the development of a model based on defect counts at six phases of the development process commonly used in IBM: high-level design inspections, low-level design inspections, code inspections, unit test, integration test, and system test. Gaffney observed that the defect pattern of his data by the six-phase development process followed a Rayleigh curve. Following the system test phase is the phase of field use (customer use). The number of latent defects in the field is the target for estimation.

By developing a Rayleigh model to fit his data, Gaffney was able to project the expected latent defects in the field. Putnam’s work includes the application of the Rayleigh model in estimating the number of software defects, in addition to his well-known work on software size and resource estimation (Putnam and Myers, 1992). By validating the model with systems for which defect data are available (including the space shuttle development and radar development projects), Putnam and Myers (1992) found that the total actual defects were within 5% to 10% of the defects predicted from the model. Data fits of a few other systems, for which the validity of the data is doubtful, however, were not so good. As in Trachtenberg’s study, the time unit for the Rayleigh model in Putnam and Myers’s application is expressed in terms of months from the project start.

Figure 7.2 shows a Rayleigh curve that models the defect removal pattern of an IBM AS/400 product in relation to a six-step development process, which is very similar to that used by Gaffney. Given the defect removal pattern up through system test (ST), the purpose is to estimate the defect rate when the product is shipped: the post general-availability phase (GA) in the figure. In this example the X-axis is the development phase, which can be regarded as one form of logical equivalent of time. The phases other than ST and GA in the figure are: high-level design review (I0), low-level design review (I1), code inspection (I2), unit test (UT), and component test (CT).

Rayleigh Model

Figure 7.2. Rayleigh Model

Basic Assumptions

Using the Rayleigh curve to model software development quality involves two basic assumptions. The first assumption is that the defect rate observed during the development process is positively correlated with the defect rate in the field, as illustrated in Figure 7.3. In other words, the higher the curve (more area under it), the higher the field defect rate (the GA phase in the figure), and vice versa. This is related to the concept of error injection. Assuming the defect removal effectiveness remains relatively unchanged, the higher defect rates observed during the development process are indicative of higher error injection; therefore, it is likely that the field defect rate will also be higher.

Rayleigh Model Illustration I

Figure 7.3. Rayleigh Model Illustration I

The second assumption is that given the same error injection rate, if more defects are discovered and removed earlier, fewer will remain in later stages. As a result, the field quality will be better. This relationship is illustrated in Figure 7.4, in which the areas under the curves are the same but the curves peak at varying points. Curves that peak earlier have smaller areas at the tail, the GA phase.

Rayleigh Model Illustration II

Figure 7.4. Rayleigh Model Illustration II

Both assumptions are closely related to the “Do it right the first time” principle. This principle means that if each step of the development process is executed properly with minimum errors, the end product’s quality will be good. It also implies that if errors are injected, they should be removed as early as possible, preferably before the formal testing phases when the costs of finding and fixing the defects are much higher than that at the front end.

To formally examine the assumptions, we conducted a hypothesis-testing study based on component data for an AS/400 product. A component is a group of modules that perform specific functions such as spooling, printing, message handling, file handling, and so forth. The product we used had 65 components, so we had a good-sized sample. Defect data at high-level design inspection (I0), low-level design inspection (I1), code inspection (I2), component test (CT), system test (ST), and operation (customer usage) were available. For the first assumption, we expect significant positive correlations between the in-process defect rates and the field defect rate. Because software data sets are rarely normally distributed, robust statistics need to be used. In our case, because the component defect rates fluctuated widely, we decided to use Spearman’s rank-order correlation. We could not use the Pearson correlation because correlation analysis based on interval data, and regression analysis for that matter, is very sensitive to extreme values, which may lead to misleading results.

Table 7.1 shows the Spearman rank-order correlation coefficients between the defect rates of the development phases and the field defect rate. Significant correlations are observed for I2, CT, ST, and all phases combined (I0, I1, I2, CT, and ST). For I0 and I1 the correlations are not significant. This finding is not surprising because (1) I0 and I1 are the earliest development phases and (2) in terms of the defect removal pattern, the Rayleigh curve peaks after I1.

Overall, the findings shown in Table 7.1 strongly substantiate the first assumption of the Rayleigh model. The significance of these findings should be emphasized because they are based on component-level data. For any type of analysis, the more granular the unit of analysis, the less chance it will obtain statistical significance. At the product or system level, our experience with the AS/400 strongly supports this assumption. As another case in point, the space shuttle software system developed by IBM Houston has achieved a minimal defect rate (the onboard software is even defect free). The defect rate observed during the IBM Houston development process (about 12 to 18 defects per KLOC), not coincidentally, is much lower than the industry average (about 40 to 60 defects per KLOC).

To test the hypothesis with regard to the second assumption of the Rayleigh model, we have to control for the effects of variations in error injection. Because error injection varies among components, cross-sectional data are not suitable for the task. Longitudinal data are better, but what is needed is a good controlled experi-ment. Our experience indicates that even developing different functions by the same team in different releases may be prone to different degrees of error. This is especially the case if one release is for a major-function development and the other release is for small enhancements.

Table 7.1. Spearman Rank Order Correlations

Phase

Rank-Order Correlation

n

Significance Level

I0

.11

65

Not significant

I1

.01

65

Not significant

I2

.28

65

.02

CT

.48

65

.0001

ST

.49

65

.0001

All (I0, I1, I2, CT, ST)

.31

65

.01

In a controlled experiment situation, a pool of developers with similar skills and experiences must be selected and then randomly assigned to two groups, the experiment group and the control group. Separately the two groups develop the same functions at time 1 using the same development process and method. At time 2, the two groups develop another set of functions, again separately and again with the same functions for both groups. At time 2, however, the experiment group intentionally does much more front-end defect removal and the control group uses the same method as at time 1. Moreover, the functions at time 1 and time 2 are similar in terms of complexity and difficulty. If the testing defect rate and field defect rate of the project by the experiment group at time 2 are clearly lower than that at time 1 after taking into account the effect of time (which is reflected by the defect rates of the control groups at the two times), then the second assumption of the Rayleigh model is substantiated.

Without data from a controlled experiment, we can look at the second assumption from a somewhat relaxed standard. In this regard, IBM Houston’s data again lend strong support for this assumption. As discussed in Chapter 6, for software releases by IBM Houston for the space shuttle software system from November 1982 to December 1986, the early detection percentages increased from about 50% to more than 85%. Correspondingly, the product defect rates decreased monotonically by about 70% (see Figures 6.1 and 6.2 in Chapter 6). Although the error injection rates also decreased moderately, the effect of early defect removal is evident.

Implementation

Implementation of the Rayleigh model is not difficult. If the defect data (defect counts or defect rates) are reliable, the model parameters can be derived from the data by computer programs (available in many statistical software packages) that use statistical functions. After the model is defined, estimation of end-product reliability can be achieved by substitution of data values into the model.

Figure 7.5 shows a simple example of implementation of the Rayleigh model in SAS, which uses the nonlinear regression procedure. From the several methods in nonlinear regression, we chose the DUD method for its simplicity and efficiency (Ralston and Jennrich, 1978). DUD is a derivative-free algorithm for nonlinear least squares. It competes favorably with even the best derivative-based algorithms when evaluated on a number of standard test problems.

Example 7.5. An SAS Program for the Rayleigh Model

/*****************************************************************/ 
/*                                                               */ 
/*  SAS program for estimating software latent-error rate based  */ 
/*      on the Rayleigh model using defect removal data during   */ 
/*      development                                              */ 
/*                                                               */ 
/* ------------------------------------------------------------- */ 
/*                                                               */ 
/*  Assumes: A 6-phase development process: High-level design(I0)*/ 
/*           Low-level design (I1), coding(I2), Unit test (UT),  */ 
/*           Component test (CT), and System test (ST).          */ 
/*                                                               */ 
/*  Program does:                                                */ 
/*     1) estimate Rayleigh model parameters                     */ 
/*     2) plot graph of Rayleigh curve versus actual defect rate */ 
/*        on a GDDM79 terminal screen (e.g., 3279G)              */ 
/*     3) perform chi-square goodness-of-fit test, indicate      */ 
/*        whether the model is adequate or not                   */ 
/*     4) derive latent error estimate                           */ 
/*                                                               */ 
/*  User input required:                                         */ 
/*       A: input defect rates and time equivalents of           */ 
/*          the six development phases                           */ 
/*       B: initial values for iteration                         */ 
/*       C: defect rates                                         */ 
/*       D: adjustment factor specific to product/development    */ 
/*          site                                                 */ 
/*                                                               */ 
/*****************************************************************/ 
TITLE1 'RAYLEIGH MODEL - DEFECT REMOVAL PATTERN'; 
OPTIONS label center missing=0 number linesize=95; 

/*****************************************************************/ 
/*                                                               */ 
/*  Set label value for graph                                    */ 
/*                                                               */ 
/*****************************************************************/ 
proc format; 
     value jx 0='I0' 
              1='I1' 
              2='I2' 
              3='UT' 
              4='CT' 
              5='ST' 
              6='GA' 
              7=' ' 
              ; 

/*****************************************************************/ 
/*                                                               */ 
/*  Now we get input data                                        */ 
/*                                                               */ 
/*****************************************************************/ 
data temp; 

/*---------------------------------------------------------------*/ 
/*  INPUT A:                                                     */ 
/*  In the INPUT statement below, Y is the defect removal rate   */ 
/*  per KLOC, T is the time equivalent for the development       */ 
/*  phases: 0.5 for I0, 1.5 for I1, 2.5 for I2, 3.5 for UT,      */ 
/*  4.5 for CT, and 5.5 for ST.                                  */ 
/*  Input data follows the CARDS statement.                      */ 
/*---------------------------------------------------------------*/ 
     INPUT Y T; 
CARDS; 
9.2   0.5 
11.9  1.5 
16.7  2.5 
5.1   3.5 
4.2   4.5 
2.4   5.5 
; 
/*****************************************************************/ 
/*                                                               */ 
/* Now we estimate the parameters of the Rayleigh distribution   */ 
/*                                                               */ 
/*****************************************************************/ 
proc NLIN method=dud outest=out1; 
/*---------------------------------------------------------------*/ 
/* INPUT B:                                                      */ 
/* The non-linear regression procedure requires initial input    */ 
/* for the K and R parameters in the PARMS statement.  K is      */ 
/* the defect rate/KLOC for the entire development process, R is */ 
/* the peak of the Rayleigh curve.  NLIN takes these initial     */ 
/* values and the input data above, goes through an iteration    */ 
/* procedure, and comes up with the final estimates of K and R.  */ 
/* Once K and R are determined, we can specify the entire        */ 
/* Rayleigh curve, and subsequently estimate the latent-error    */ 
/* rate.                                                         */ 
/*---------------------------------------------------------------*/ 
     PARMS K=49.50 to 52 by 0.1 
           R=1.75 to 2.00 by 0.01; 
    *bounds K<=50.50,r>=1.75; 
 model y=(1/R**2)*t*K*exp((-1/(2*r**2))*t**2); 

data out1; set out1; 
     if _TYPE_ = 'FINAL'; 
proc print dana=out1; 




/*****************************************************************/ 
/*                                                               */ 
/* Now we prepare to plot the graph                              */ 
/*                                                               */ 
/*****************************************************************/ 
/*---------------------------------------------------------------*/ 
/* Specify the entire Rayleigh curve based on the estimated      */  
/* parameters                                                    */ 
/*---------------------------------------------------------------*/ 
data out2; set out1; 
     B=1/(2*R**2); 
     do I=l to 140; 
        J=I/20; 
        RAY=exp(-B*(J-0.05)**2) - exp(-B*J**2); 
        DEF=ray*K*20; 
        output ; 
     end; 
label DEF='DEFECT RATE'; 
/*---------------------------------------------------------------*/ 
/* INPUT C:                                                      */ 
/* Prepare for the histograms in the graph, values on the right  */ 
/* hand side of the assignment statements are the actual         */ 
/* defect removal rates--same as those for the INPUT statement   */ 
/*---------------------------------------------------------------*/ 
data out2 ; set out2; 
if 0<=J<1 then DEF1=9.2 ; 
if 1<=J<2 then DEF1=l1.9 ; 
if 2<=J<3 then DEF1=16.7 ; 
if 3<=J<4 then DEF1-5.1 ; 
if 4<=J<5 then DEF1=4.2 ; 
if 5<=j<=6 then DEF1=2.4 ; 
label J='DEVELOPMENT PHASES'; 
; 

/*****************************************************************/ 
/*                                                               */ 
/* Now we plot the graph on a GDDM79 terminal screen(e.g., 3279G)*/ 
/* The graph can be saved and plotted out through graphics       */ 
/* interface such as APGS                                        */ 
/*                                                               */ 
/*****************************************************************/ 
     goptions device=GDDM79; 
   * GOPTIONS DEVICE=GDDMfam4 GDDMNICK=p3820 GDDMTOKEN=img240x 
              HSIZE=8 VSIZE=11; 
   * OPTIONS DEVADDR=(.,.,GRAPHPTR); 

proc gplot data=out2; 
 plot DEF*J  DEF1*J/overlay vaxis=0 to 25 by 5 vminor=0 fr 
                          hminor=0; 
     symbol1 i=joint v=none c=red; 
     symbol2 i=needle v=none    c=green; 
     formal j jx.; 

/*****************************************************************/ 
/*  Now we compute the chi-square goodness-of-fit test           */ 
/*     Note that the CDF should be used instead of               */ 
/*     the PDF. The degree of freedom is                         */ 
/*     n-l-#parameters, in this case, n-1-2                      */ 
/*                                                               */ 
/*****************************************************************/ 
data out1; set out1; 
     DO i=1 to 6; 
     OUTPUT; 
     END; 
     keep K R;  
data temp2; merge out1 temp; 
     T=T + 0.5; 
     T_1 = T-l; 
     b=1/(R*R*2); 
     E_rate = K*(exp(-b*T_I*T_1) - exp(-b*T*T)); 
     CHI_sq = ( y  - E_rate)**2 / E_rate; 
proc sort data=temp2; by T; 
data temp2; set temp2; by T; 
     if T=1 then T_chisq = 0; 
     T_chisq + CHI_sq; 

proc sort data=temp2; by K T; 
data temp3; set temp2; by K T; 
     if LAST.K; 
     df = T-1-2; 
     p= 1- PROBCHI(T_chisq, df); 
     IF p>0.05 then 
        RESULT='Chi-square test indicates that model is adequate.   '; 
     ELSE 
        RESULT='Chi-square test indicates that model is inadequate. ' ; 

     keep T_chisq df p RESULT; 
proc print data=temp3; 

/*****************************************************************/ 
/*  INPUT D - the value of ADJUST                                */ 
/*  Now we estimate the latent-error rate.  The Rayleigh model   */ 
/*  is known to under-estimate.                                  */ 
/*  To have good predictive validity, it                         */ 
/*  is important to use an adjustment factor based on the        */ 
/*  prior experience of your product.                            */ 
/*****************************************************************/ 
data temp4; set temp2; by K T; 
     if LAST.K; 

     ADJUST = 0.15; 

     E_rate = K*exp(-b*T*T); 
     Latent= E_rate + ADJUST; 
     label Latent = 'Latent Error Rate per KCSI'; 
     keep Latent; 
proc print data=temp4 label; 



 RUN; 
CMS FILEDEF * CLEAR ; 
ENDSAS;  

The SAS program estimates model parameters, produces a graph of fitted model versus actual data points on a GDDM79 graphic terminal screen (as shown in Figure 7.2), performs chi square goodness-of-fit tests, and derives estimates for the latent-error rate. The probability (p value) of the chi square test is also provided. If the test results indicate that the fitted model does not adequately describe the observed data (p > .05), a warning statement is issued in the output. If proper graphic support is available, the colored graph on the terminal screen can be saved as a file and plotted via graphic plotting devices.

In the program of Figure 7.5, r represents tm as discussed earlier. The program implements the model on a six-phase development process. Because the Rayleigh model is a function of time (as are other reliability models), input data have to be in terms of defect data by time. The following time equivalent values for the development phases are used in the program:

  • I0 — 0.5

  • I1 — 1.5

  • I2 — 2.5

  • UT — 3.5

  • CT — 4.5

  • ST — 5.5

Implementations of the Rayleigh model are available in industry. One such example is the Software LIfe-cycle Model tool (SLIM) developed by Quantitative Software Management, Inc., of McLean, Virginia. SLIM is a software product designed to help software managers estimate the time, effort, and cost required to build medium and large software systems. It embodies the software life-cycle model developed by Putnam (Putnam and Myers, 1992), using validated data from many projects in the industry. Although the main purpose of the tool is for life-cycle project management, estimating the number of software defects is one of the important elements. Central to the SLIM tool are two important management indicators. The first is the productivity index (PI), a “big picture” measure of the total development capability of the organization. The second is the manpower buildup index (MBI), a measure of staff buildup rate. It is influenced by scheduling pressure, task concurrency, and resource constraints. The inputs to SLIM include software size (lines of source code, function points, modules, or uncertainty), process productivity (methods, skills, complexity, and tools), and management constraints (maximum people, maximum budget, maximum schedule, and required reliability). The outputs from SLIM include the staffing curve, the cumulative cost curve over time, probability of project success over time, reliability curve and the number of defects in the product, along with other metrics. In SLIM the X-axis for the Rayleigh model is in terms of months from the start of the project.

As a result of Gaffney’s work (1984), in 1985 the IBM Federal Systems Division at Gaithersburg, Maryland, developed a PC program called the Software Error Estimation Reporter (STEER). The STEER program implements a discrete version of the Rayleigh model by matching the input data with a set of 11 stored Rayleigh patterns and a number of user patterns. The stored Rayleigh patterns are expressed in terms of percent distribution of defects for the six development phases mentioned earlier. The matching algorithm involves taking logarithmic transformation of the input data and the stored Rayleigh patterns, calculating the separation index between the input data and each stored pattern, and choosing the stored pattern with the lowest separation index as the best-fit pattern.

Several questions arise about the STEER approach. First, the matching algorithm is somewhat different from statistical estimation methodologies, which derive estimates of model parameters directly from the input data points based on proved procedures. Second, it always produces a best-match pattern even when none of the stored patterns is statistically adequate to describe the input data. There is no mention of how little of the separation index indicates a good fit. Third, the stored Rayleigh patterns are far apart; specifically, they range from 1.00 to 3.00 in terms of tm, with a huge increment of 0.25. Therefore, they are not sensitive enough for estimating the latent-error rate, which is usually a very small number.

There are, however, circumventions to the last two problems. First, use the separation index conservatively; be skeptical of the results if the index exceeds 1.00. Second, use the program iteratively: After selecting the best-match pattern (for instance, the one with tm = 1.75), calculate a series of slightly different Rayleigh patterns that center at the best-match pattern (for instance, patterns ranging from tm = 1.50 to tm = 2.00, with an increment of 0.05 or 0.01), and use them as user patterns to match with the input data again. The outcome will surely be a better “best match.”

When used properly, the first two potential weak points of STEER can become its strong points. In other words, STEER plays down the role of formal parameter estimation and relies heavily on matching with existing patterns. If the feature of self-entered user patterns is used well (e.g., use defect patterns of projects from the same development organizations that have characteristics similar to those of the project for which estimation of defects is sought), then empirical validity is established. From our experience in software reliability projection, the most important factor in achieving predictive validity, regardless of the model being used, is to establish empirical validity with historical data.

Table 7.2 shows the defect removal patterns of a number of projects, the defect rates observed during the first year in the field, the life-of-product (four years) projection based on the first-year data, and the projected total latent defect rate (life-of-product) from STEER. The data show that the STEER projections are very close to the LOP projections based on one year of actual data. One can also observe that the defect removal patterns and the resulting field defects lend support to the basic assumptions of the Rayleigh model as discussed earlier. Specifically, more front-loaded defect patterns lead to lower field defect rates and vice versa.

Table 7.2. Defect Removal Patterns and STEER Projections

   

Defects Per KLOC

Project

LOC

Language

High-Level Design

Low-Level Design

Code

Unit Test

Integration Test

System Test

First-Year Field Defect

LOP Field Defect

STEER Estimate

A

680K

Jovial

4

13

5

4

2

0.3

0.6

0.6

B

30K

PL/1

2

7

14

9

7

3.0

6.0

6.0

C

70K

BAL

6

25

6

3

2

0.5

0.2

0.4

0.3

D

1700K

Jovial

4

10

15

4

3

3

0.4

0.8

0.9

E

290K

ADA

4

8

13

8

0.1

0.3

0.6

0.7

F

70K

1

2

4

6

5

0.9

1.1

2.2

2.1

G

540K

ADA

2

5

12

12

4

1.8

0.6

1.2

1.1

H

700K

ADA

6

7

14

3

1

0.4

0.2

0.4

0.4

Reliability and Predictive Validity

In Chapter 3 we examined issues associated with reliability and validity. In the context of modeling, reliability refers to the degree of change in the model output due to chance fluctuations in the input data. In specific statistical terms, reliability relates closely to the confidence interval of the estimate: The narrower the confidence interval, the more reliable the estimate, and vice versa. Confidence interval, in turn, is related to the sample size: Larger samples yield narrower confidence intervals. Therefore, for the Rayleigh model, which is implemented on a six-phase development process, the chance of having a satisfactory confidence interval is very slim. My recommendation is to use as many models as appropriate and rely on intermodel reliability to establish the reliability of the final estimates. For example, in addition to the Rayleigh model, one can attempt the exponential model or other reliability growth models (see Chapter 8). Although the confidence interval for each model estimate may not be satisfactory, if the estimates by different models are close to each other, confidence in the estimates is strengthened. In contrast, if the estimates from different models are not consistent, we will not have much confidence in our estimates even if the confidence interval for each single estimate is small. In such cases, more investigation is needed to understand and to reconcile the differences across models before a final estimate is decided.

Predictive validity refers simply to the accuracy of model estimates. The fore-most thing to achieve predictive validity is to make sure that the input data are accurate and reliable. As discussed in an earlier chapter, there is much room for improvement in data quality in the software industry in general, including defect tracking in software development. Within the development process, usually the tracking system and the data quality are better at the back end (testing) than at the front end (requirements analysis, design reviews, and code inspections). Without accurate data, it is impossible to obtain accurate estimates.

Second, and not less important, to establish predictive validity, model estimates and actual outcomes must be compared and empirical validity must be established. Such empirical validity is of utmost importance because the validity of software reliability models, according to the state of the art, is context specific. A model may work well in a certain development organization for a group of products using certain development processes, but not in dissimilar environments. No universally good software reliability model exists. By establishing empirical validity, we ensure that the model works in the intended context. For instance, when applying the Rayleigh to the AS/400 data, we verified the model based on many releases of the System/38 and System/36 data. We found that the Rayleigh model consistently underestimated the software field defect rate. To improve its predictive validity, we calibrated the model output with an adjustment factor, which is the mean difference between the Rayleigh estimates and the actual defect rates reported. The calibration is logical, given the similar structural parameters in the development process among the three computer systems, including organization, management, and work force.

Interestingly, Wiener-Ehrlich and associates also found that the Rayleigh model underestimated the manloading scores of a software project at the tail (Wiener-Ehrlich et al., 1984). It may be that the model is really too optimistic for software applications. A Weibull distribution with an m of less than 2 (for example, 1.8) might work better for software. This is a worthwhile research topic if reliable and complete data (including process as well as field defect data) for a large number of projects are available. It should be cautioned that when one models the data with the Weibull distribution to determine the value of the m parameter, one should be sure to use the complete data set. If incomplete data are used (e.g., in-process data for the current project), the m value thus obtained will be artificially high, which will lead to underestimates of the software defects. This is because m is the shape parameter of the Weibull distribution; it will fit the shape of the data points available during the estimation process. Therefore, for in-process data, a fixed m value should be used when modeling with the Weibull distribution. We have seen examples of misuse of the Weibull distribution with in-process data, resulting in invalid estimates of software defects.

To further test our observation that the Rayleigh model underestimates the tail end of the distribution of software data, we started to look for a meaningful set of data from another organization. We obtained the field defect arrival data for a systems software that was developed at IBM in Austin, Texas. The data set contains more than sixty monthly data points, representing the entire life cycle of field defects of the software. We fitted a number of software reliability models including the Rayleigh and the Weibull with several m values. As shown in Figure 7.6, we found that the Weibull model with m = 1.8 gave a better fit of the distribution than the Rayleigh model, although both pass the goodness-of-fit test.

Rayleigh Model Versus Weibull Distribution with m = 1.8

Figure 7.6. Rayleigh Model Versus Weibull Distribution with m = 1.8

The three cases of Rayleigh underestimation discussed are from different software development organizations, and the time frame spans sixteen years from 1984 to 2000. Although more research is needed, based on the reasons discussed here, we recommend the use of Weibull with m = 2 in Rayleigh applications when estimation accuracy at the tail end is important.

Summary

The Rayleigh model is a special case of the Weibull distribution family, which has been widely used for reliability studies in various fields. Supported by a large body of empirical data, software projects were found to follow a life-cycle pattern described by the Rayleigh curve, for both resource and staffing demand and defect discovery/removal patterns. The Rayleigh model is implemented in several software products for quality assessment. It can also be implemented easily via statistical software packages, such as the example provided in this chapter.

Compared to the phase-based defect removal model, the Rayleigh model is a formal parametric model that can be used for projecting the latent software defects when the development work is complete and the product is ready to ship to customers. The rationale behind the model fits well with the rationale for effective software development. Specifically, while the defect removal effectiveness approach focuses on defect removal, the Rayleigh encompasses both defect prevention (reduction in defect rates) and early defect removal.

In addition to quality projection, another strength of the Rayleigh model is that it provides an excellent framework for quality management. After we discuss the reliability growth models in the next chapter, in Chapter 9 we will revisit the Rayleigh model in its capacity as a quality management model.

References

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.242.204