Pattern Extraction Using Histogram ◾ 155
Figure 10.7 is a histogram of defects found per test case, a key metric in soft-
ware testing. e LSL has been set at 0.08, and the USL has been specified at 0.5.
Perhaps specifying a USL is unnecessary, but it is meant to cause an alert on prod-
uct quality in this case.
e lower limit is less controversial; the team expects a minimum return from
test cases. e target 0.3 defects found per test case is also marked for reference.
Visually, the histogram reveals capability-related issues: the process peak is not on
target, and a good deal of results (roughly 28%) fall below the lower specification
limit. e process has a lot to improve. e capability index is 0.26, much lower
than the desirable value of at least 0.8.
e test case metric “defects found per test case” is a double-edged sword. If
more defects are found, it could be either due to the poor quality of product on test
or highly effective test cases. If few defects are found, it could mean that the prod-
uct is good or the test cases are not effective. One has to appreciate both the pos-
sibilities and use extraneous evidence to judge the histogram.
Histogram as a Judge
ere is a very significant use of the histogram as a judge. First, it is a visual judge
of the normality of data. ere are debates about the normality of metric data, and
people do a normality test. A visual judgment of the histogram of the data can be a
first-order judgment of normality. If data are not normal, people do not do esoteric
statistical tests on the data.
Process data in software development is often nonnormal.
In these cases, the histogram is used to visualize data and to make a decision
about statistical tests. For example, time to repair data are avowedly nonnormal;
all known histograms testify to this. e problem is now escalated: one should use
nonparametric tests, or one should transform data appropriately and do statistical
tests. e author prefers the first. Let us keep data in its purest form.
ere is an area where we are sure data will have to be normal: prediction errors.
In any prediction model, errors are symmetric around the mean, and the mean
error value is zero. A histogram is of the errors usually plotted, to see if it is sym-
metrical around zero, to validate the prediction model.
Good regression models leave behind errors, or residuals, which are nor-
mally distributed and can be tested with histograms.
If the histogram is skewed, the model is not accepted and needs to be improved.