272 Simple Statistical Methods for Software Engineering
0
0
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
2 4 6 8 10
Design complexity
12 14 16
Shape 0.7 Scale 0.5
Shape 0.896 Scale 0.771 MOM
Shape 1.1 Scale 1
Shape 1.3 Scale 1.4
18 20
Figure 17.5 Pertubations of log-normal distribution of design complexity.
250
200
150
100
50
0
<=0
(0, 1]
(1, 2]
(2, 3]
(3, 4]
(4, 5]
(5, 6]
(6, 7]
Design complexity
(7, 8]
(8, 9]
(9, 10]
(10, 11]
(11, 12]
(12, 13]
(13, 14]
(14, 15]
Figure 17.4 Design complexity histogram.
Software Size Growth 273
It may be seen in Figure 17.5 that the curve with MOM parameters matches
the histogram, with a tail nishing at 15. e other curves either stop up front or
overshoot.
It may also be seen that we have made no attempt to make the log-normal model
represent the outliers seen in the box plot seen in Figure 17.1. Our emphasis has
remained on the body of the log-normal and not on its tail. at emphasis depends
on our strategy in building the body or the tail. Had we wished to emphasize the
tail, then we would have opted for attaching a Pareto tail. To pursue this idea, we
need to improve the precision of our judgment. We do so by evaluating errors in
prediction. For nine values of percentiles, ranging from 0.1 to 0.9 in steps of 0.1,
data values are computed rst by using the percentile function in Excel. e same
percentiles are interpreted as probabilities in cumulative log-normal distribution,
and we predict design complexities corresponding to the nine percentiles by doing
an inverse calculation using LOGNORMINV in Excel. e difference between
predicted value and data value is taken as error in prediction. To find a meaningful
average error, we find absolute error each time, remove the sign, and then take the
average. Prediction errors are calculated for all the candidate log-normal models.
e calculations are shown in Table 17.1.
In Model A, the parameters directly obtained by MOM are used. In Model B,
the scale is estimated with reference to the median. In Model C, the scale is the
same as Model B, but the shape has been perturbed till mean absolute error (MAE)
converged to a minimum.
For further discussion, let us choose the optimized Model C having minimum
error. e optimized model has a scale of 0.693 and a shape of 0.930. We can
plot the log-normal cumulative distribution function (CDF) of Model C using the
Excel function, as follows:
CDF = LOGNORM.DIST(x, scale, shape, 1).
After substitution, the expression becomes
CDF = LOGNORM.DIST(x, 0.693, 0.930, 1).
For our reference, we can plot the probability density function (PDF) using the
Excel function, as follows:
PDF = LOGNORM.DIST(x, scale, shape, 0).
After substitution the expression becomes
PDF = LOGNORM.DIST(x, 0.693, 0.930, 0).
In both the cases, x is a design complexity. e plots are shown in Figure 17.6.
274 Simple Statistical Methods for Software Engineering
Table 17.1 Prediction Errors
Models
A B X1 C X2 X3 X4
Shape 0.896 0.896 0.92 0.93 0.94 0.93 0.93
Scale 0.771 0.693 0.693 0.693 0.693 0.68 0.7
Percentile Data
Prediction
Absolute
Error
Prediction
Absolute
Error
Prediction
Absolute
Error
Prediction
Absolute
Error
Prediction
Absolute
Error
Prediction
Absolute
Error
Prediction
Absolute
Error
0.1 1 0.7 0.314 0.6 0.051 0.6 0.019 0.6 0.008 0.6 0.008 0.6 0.000 0.6 0.012
0.2 1 1.0 0.017 0.9 0.076 0.9 0.019 0.9 0.008 0.9 0.008 0.9 0.004 0.9 0.018
0.3 1 1.4 0.351 1.2 0.101 1.2 0.016 1.2 0.006 1.2 0.006 1.2 0.009 1.2 0.024
0.4 1 1.7 0.723 1.6 0.129 1.6 0.010 1.6 0.004 1.6 0.004 1.6 0.016 1.6 0.032
0.5 2 2.2 0.162 2.0 0.162 2.0 0.000 2.0 0.000 2.0 0.000 2.0 0.026 2.0 0.040
0.6 2 2.7 0.713 2.5 0.204 2.5 0.015 2.5 0.006 2.5 0.006 2.5 0.039 2.5 0.050
0.7 3 3.5 0.459 3.2 0.260 3.2 0.041 3.3 0.017 3.3 0.017 3.2 0.059 3.3 0.065
0.8 5 4.6 0.404 4.3 0.345 4.3 0.087 4.4 0.037 4.4 0.037 4.3 0.093 4.4 0.087
0.9 8 6.8 1.184 6.3 0.511 6.5 0.197 6.6 0.084 6.7 0.085 6.5 0.170 6.6 0.131
MAE 0.48082 0.20445 0.04475 0.01889 0.01903 0.04641 0.05113
Note: MAE, mean absolute error, obtained from three important log-normal models given as follows: (A) Scale = Ln (mean), 0.771; Shape = SD Ln(x) 0.896;
MAE = 0.481; (B) Scale = Ln (median), 0.693; Shape = SD Ln(x) 0.896; MAE = 0.204; (C) Scale = Ln (median), 0.693; Shape = 0.930; MAE = 0.019.
Software Size Growth 275
Box 17.2 LogAriThmic scALe
e human ear responds to sound in a logarithmic scale; the response has an
amazing dynamic range, from extremely small to extremely large sound. Such
a range is possible if the scale were logarithmic. If the sound level increases
tenfold, the response increases one notch. Sound level is measured in decibels,
logarithms to the base 10 of sound intensities.
0
0
0.05
0.10
0.15
0.20
0.25
0.30
0.35
2 4 6 8 10 12 14 16
Design complexity
0
0
0.2
0.4
0.6
0.8
1
1.2
2 4 6 8 10 12 14 16
Design complexity
Scale 0.693 Shape 0.93
Scale 0.693 Shape 0.93
(a)
(b)
Figure 17.6 Optimal log-normal (a) PDF and (b) CDF of design complexity.
276 Simple Statistical Methods for Software Engineering
Application of the Log-Normal Model 1
If an engineering limit on design complexity is set at 6, then we can nd the
probability of meeting this limit (certainty) from the CDF. In the CDF shown
in Figure 17.6, a vertical line runs from x = 6 to meet the CDF; from the point
of intersection, a horizontal line is drawn, which meets the y axis at approxi-
mately 0.9.
e exact value is obtained from the CDF as follows:
Probability that design complexity is <6:
= CDF (x = 6)
= LOGNORM.DIST (6,0.693,0.930,1)
= 0.8813
e above number represents the certainty of meeting the design complexity
goal of 6.
We can extend the analysis to calculate the risk of design complexity exceeding
the limit of 6.
e risk of design complexity exceeding the limit of 6 is as follows:
= 1 − CDF (x = 6)
= 1 − LOGNORM.DIST (6,0.693,0.930,1)
= 1 − 0.8813
= 0.1187
Calculating certainty and risk is a most useful application of log-normal
distribution.
Earthquakes are measured in a logarithmic scale. If the power unleashed is
10 times larger in the Richter scale (invented by Charles F. Richter in 1934)
for measuring earthquake strength, the signal jumps one point. e release
of a million-fold strong outburst appears as a mere six-point movement in
the Richter scale. At the same time, the Richter scale is sensitive enough to
register very small seismic activities, too small to be detected by humans. e
range of the Richter scale is enormous.
Orders of magnitude are seen through logarithms. e conversion from
logarithms to real scale is achieved by taking antilogarithms. A trained
human mind quickly does a conversion by applying rules and examples. In
this context, the log-normal distribution is a return to a natural way of deal-
ing with huge magnitudes. One just has to get used to it.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.158.160