Chapter 17 - Software Size Growth: Log-Normal Distribution (2/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

272 ◾ Simple Statistical Methods for Software Engineering

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

2 4 6 8 10

Design complexity

12 14 16

Shape 0.7 Scale 0.5

Shape 0.896 Scale 0.771 MOM

Shape 1.1 Scale 1

Shape 1.3 Scale 1.4

18 20

Figure 17.5 Pertubations of log-normal distribution of design complexity.

250

200

150

100

<=0

(0, 1]

(1, 2]

(2, 3]

(3, 4]

(4, 5]

(5, 6]

(6, 7]

Design complexity

(7, 8]

(8, 9]

(9, 10]

(10, 11]

(11, 12]

(12, 13]

(13, 14]

(14, 15]

Figure 17.4 Design complexity histogram.

Software Size Growth ◾ 273

It may be seen in Figure 17.5 that the curve with MOM parameters matches

the histogram, with a tail ﬁnishing at 15. e other curves either stop up front or

overshoot.

It may also be seen that we have made no attempt to make the log-normal model

represent the outliers seen in the box plot seen in Figure 17.1. Our emphasis has

remained on the body of the log-normal and not on its tail. at emphasis depends

on our strategy in building the body or the tail. Had we wished to emphasize the

tail, then we would have opted for attaching a Pareto tail. To pursue this idea, we

need to improve the precision of our judgment. We do so by evaluating errors in

prediction. For nine values of percentiles, ranging from 0.1 to 0.9 in steps of 0.1,

data values are computed ﬁrst by using the percentile function in Excel. e same

percentiles are interpreted as probabilities in cumulative log-normal distribution,

and we predict design complexities corresponding to the nine percentiles by doing

an inverse calculation using LOGNORMINV in Excel. e diﬀerence between

predicted value and data value is taken as error in prediction. To ﬁnd a meaningful

average error, we ﬁnd absolute error each time, remove the sign, and then take the

average. Prediction errors are calculated for all the candidate log-normal models.

e calculations are shown in Table 17.1.

In Model A, the parameters directly obtained by MOM are used. In Model B,

the scale is estimated with reference to the median. In Model C, the scale is the

same as Model B, but the shape has been perturbed till mean absolute error (MAE)

converged to a minimum.

For further discussion, let us choose the optimized Model C having minimum

error. e optimized model has a scale of 0.693 and a shape of 0.930. We can

plot the log-normal cumulative distribution function (CDF) of Model C using the

Excel function, as follows:

CDF = LOGNORM.DIST(x, scale, shape, 1).

After substitution, the expression becomes

CDF = LOGNORM.DIST(x, 0.693, 0.930, 1).

For our reference, we can plot the probability density function (PDF) using the

Excel function, as follows:

PDF = LOGNORM.DIST(x, scale, shape, 0).

After substitution the expression becomes

PDF = LOGNORM.DIST(x, 0.693, 0.930, 0).

In both the cases, x is a design complexity. e plots are shown in Figure 17.6.

274 ◾ Simple Statistical Methods for Software Engineering

Table 17.1 Prediction Errors

Models

A B X1 C X2 X3 X4

Shape 0.896 0.896 0.92 0.93 0.94 0.93 0.93

Scale 0.771 0.693 0.693 0.693 0.693 0.68 0.7

Percentile Data

Prediction

Absolute

Error

Prediction

Absolute

Error

Prediction

Absolute

Error

Prediction

Absolute

Error

Prediction

Absolute

Error

Prediction

Absolute

Error

Prediction

Absolute

Error

0.1 1 0.7 0.314 0.6 0.051 0.6 0.019 0.6 0.008 0.6 0.008 0.6 0.000 0.6 0.012

0.2 1 1.0 0.017 0.9 0.076 0.9 0.019 0.9 0.008 0.9 0.008 0.9 0.004 0.9 0.018

0.3 1 1.4 0.351 1.2 0.101 1.2 0.016 1.2 0.006 1.2 0.006 1.2 0.009 1.2 0.024

0.4 1 1.7 0.723 1.6 0.129 1.6 0.010 1.6 0.004 1.6 0.004 1.6 0.016 1.6 0.032

0.5 2 2.2 0.162 2.0 0.162 2.0 0.000 2.0 0.000 2.0 0.000 2.0 0.026 2.0 0.040

0.6 2 2.7 0.713 2.5 0.204 2.5 0.015 2.5 0.006 2.5 0.006 2.5 0.039 2.5 0.050

0.7 3 3.5 0.459 3.2 0.260 3.2 0.041 3.3 0.017 3.3 0.017 3.2 0.059 3.3 0.065

0.8 5 4.6 0.404 4.3 0.345 4.3 0.087 4.4 0.037 4.4 0.037 4.3 0.093 4.4 0.087

0.9 8 6.8 1.184 6.3 0.511 6.5 0.197 6.6 0.084 6.7 0.085 6.5 0.170 6.6 0.131

MAE 0.48082 0.20445 0.04475 0.01889 0.01903 0.04641 0.05113

Note: MAE, mean absolute error, obtained from three important log-normal models given as follows: (A) Scale = Ln (mean), 0.771; Shape = SD Ln(x) 0.896;

MAE = 0.481; (B) Scale = Ln (median), 0.693; Shape = SD Ln(x) 0.896; MAE = 0.204; (C) Scale = Ln (median), 0.693; Shape = 0.930; MAE = 0.019.

Software Size Growth ◾ 275

Box 17.2 LogAriThmic scALe

e human ear responds to sound in a logarithmic scale; the response has an

amazing dynamic range, from extremely small to extremely large sound. Such

a range is possible if the scale were logarithmic. If the sound level increases

tenfold, the response increases one notch. Sound level is measured in decibels,

logarithms to the base 10 of sound intensities.

0.05

0.10

0.15

0.20

0.25

0.30

0.35

2 4 6 8 10 12 14 16

Design complexity

0.2

0.4

0.6

0.8

1.2

2 4 6 8 10 12 14 16

Design complexity

Scale 0.693 Shape 0.93

(a)

(b)

Figure 17.6 Optimal log-normal (a) PDF and (b) CDF of design complexity.

276 ◾ Simple Statistical Methods for Software Engineering

Application of the Log-Normal Model 1

If an engineering limit on design complexity is set at 6, then we can ﬁnd the

probability of meeting this limit (certainty) from the CDF. In the CDF shown

in Figure 17.6, a vertical line runs from x = 6 to meet the CDF; from the point

of intersection, a horizontal line is drawn, which meets the y axis at approxi-

mately 0.9.

e exact value is obtained from the CDF as follows:

Probability that design complexity is <6:

= CDF (x = 6)

= LOGNORM.DIST (6,0.693,0.930,1)

= 0.8813

e above number represents the certainty of meeting the design complexity

goal of 6.

We can extend the analysis to calculate the risk of design complexity exceeding

the limit of 6.

e risk of design complexity exceeding the limit of 6 is as follows:

= 1 − CDF (x = 6)

= 1 − LOGNORM.DIST (6,0.693,0.930,1)

= 1 − 0.8813

= 0.1187

Calculating certainty and risk is a most useful application of log-normal

distribution.

Earthquakes are measured in a logarithmic scale. If the power unleashed is

10 times larger in the Richter scale (invented by Charles F. Richter in 1934)

for measuring earthquake strength, the signal jumps one point. e release

of a million-fold strong outburst appears as a mere six-point movement in

the Richter scale. At the same time, the Richter scale is sensitive enough to

range of the Richter scale is enormous.

Orders of magnitude are seen through logarithms. e conversion from

logarithms to real scale is achieved by taking antilogarithms. A trained

human mind quickly does a conversion by applying rules and examples. In

this context, the log-normal distribution is a return to a natural way of deal-

ing with huge magnitudes. One just has to get used to it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 17 - Software Size Growth: Log-Normal Distribution (2/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 17 - Software Size Growth: Log-Normal Distribution (2/4)