Chapter 17 - Software Size Growth: Log-Normal Distribution (3/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Software Size Growth ◾ 277

Application of the Log-Normal Model 2

e next application refers to developing a control chart for design complexity. We

cannot apply the Shewhart Control Chart with the mean μ as the central line and

μ ± 3 σ as the control limits. Shewhart chart assumes a normal distribution, and its

design complexity is log-normal. Shewhart limits are symmetrical design complex-

ity limits that cannot be symmetrical. We have established design complexity as a

skewed distribution. Besides, we do not need a lower control limit for complexity.

All we need is an upper control limit.

Shewhart limits include 99.73% of process inside the limits and keep only

0.27% as outliers. Shewhart limits apply better for manufacturing processes.

For creative processes such as software design, the authors suggest a diﬀerent

rule for control limits. e proposed limits include 95% (0.95) of processes

inside the limits and mark 5% of processes as outliers. is upper control limit

is obtained from the CDF shown in Figure 17.6 as an x value corresponding to

a y value of 0.95.

Upper control limit for y = 0.95

= LOGNORMINV (y, scale, shape)

= LOGNORMINV (0.95,0.693,0.93)

= 9.2324

is sets the statistical limit for upper control of design complexity.

Features Addition in Software Enhancement

In Chapter 13, we treated requirement volatility to a Gaussian with a standard

deviation of approximately 3.3% and a mean value of 4%, for full life cycle devel-

opment projects with stringent business control on requirements. In large enhance-

ment projects, the Gaussian model does not hold; here changes are far more

common. Features added after requirements are “ﬁnalized” can touch high values,

as high as 50%. e growth of features is log-normal. e pattern of growth varies.

ree examples, A, B, and C, are presented in Figure 17.7.

Model C has the largest scale of 20 and the fattest tail. Model B has a scale of 10

and a medium-sized tail. Model A has a scale of 4 and has an early ﬁnishing point.

All three models represent the customer’s processes over which the maintenance

team has no direct control. In such cases, statistical management reduces to empiri-

cal understanding of the process with data and creating the appropriate PDFs. To

recognize if variation is Gaussian or log-normal is the ﬁrst step; this is enabled by

histograms. Fitting the appropriate PDF by parameter extraction is the next step.

Applying the model to solve problems and take decisions is the goal.

278 ◾ Simple Statistical Methods for Software Engineering

A Log-Normal PDF for Change Requests

e process of “change requests” in a support project is a mixture; it is composed

of assorted tasks, including bug ﬁx, feature addition, and patchwork. A PDF of

change requests is modeled with the following parameters:

Shape σ = 1

Scale β = 7

e scale factor is set at the median of data. e shape factor is chosen by an

iterative search for best ﬁt. e log-normal PDF for change requests is plotted in

Figure 17.8.

However, this is merely curve ﬁtting. is model does not beneﬁt from

the ideological context such as that present in the model for feature addition

or design complexity. Despite this limitation, the model can still be used for

forecasting.

A better approach, beyond the scope of this book, would be to create a mixture

model, combining inherent probabilistic characteristics of the components.

Bug xes may be denoted by Weibull distribution, patchwork by beta dis-

tribution and feature addition by log-normal distribution.

Mathematically combining the three would require a series of approxi-

mations and special analytical treatments, which would require a specialist’s

knowledge. However, such a combination can also be achieved digitally by

simulation.

0.18

0.16

0.14

0.12

0.10

0.08

0.06

0.04

0.02

0 5 10 15 20 25

Features added %

Probability

Shape 1 Scale 4

Shape 1 Scale 10

Shape 1 Scale 20

35 40 45 50

Figure 17.7 Log-normal PDF of software enhancements.

Software Size Growth ◾ 279

From Pareto to Log-Normal

Pareto distribution is a power law and is known for its fat tail (see Chapter 16 for

more details). Log-normal is a growth model and also has a limited-sized tail. One can

switch to Pareto if bigger tails are needed. However, the similarity does not end in tails.

When examining income distribution data, Aitchison and

Brown (1954) observe that for lower incomes a lognormal dis-

tribution appears a better ﬁt, while for higher incomes a power

law distribution appears better [4].

Power law distributions and log-normal distributions are quite natural models

and can be generated intuitively. ey are also intrinsically connected.

Power law and log-normal both have been applied to ﬁle size distributions in

the Internet; they fare equally well. From a pragmatic point of view, it might be

reasonable to use whichever distribution makes it easier to obtain results.

Some Properties of Log-Normal Distribution

Some properties of log-normal distribution can come in handy while analyzing

data. e following median-related formulas are given in NIST:

Mean Median= e

(17.5)

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

5 10 15

Change requests

Probability

20 25

reshold

Shape

Scale

Figure 17.8 Log-normal PDF of change requests.

280 ◾ Simple Statistical Methods for Software Engineering

Variance Median= −

( )

2 2

1e e

σ σ

(17.6)

e central tendencies are deﬁned in terms of parameters as follows:

Mean













(17.7)

Median = e

(17.8)

Mode

−

( )

β σ

(17.9)

We can see from the previous equations that the mean is always larger than the

median. Similarly, the mode is the smallest.

When β = 1, the log-normal distribution is called standard log-normal distri-

bution.

Case Study—Analysis of Failure Interval

Log-normal distribution is widely used in reliability studies. NIST presents several

models for reliability analysis, and log-normal is one of them. e choice depends

on interpretation of the famous bathtub curve. Initially, mechanical systems show

infant mortality with a failure rate that increases till the system stabilizes. en

failure rate decreases and reaches a ﬂat low level. When the failure rate is constant,

the exponential distribution is enough.

When the failure rate is changing, log-normal or Weibull or other models capa-

ble of handling change are required.

Reliability Analysis Centre [5] illustrates an example of log-normal distribution

with a scale of 10.3 and a shape of 1.0196 to represent infant mortality and speedy

recovery, although Weibull is their favorite model for reliability analysis.

In mechanical systems reliability decreases with time whereas in software

products reliability increases with usage, bug discovery and bug xing.

Failure mechanisms propagate and grow in physical systems; in software, they

are located, conﬁned, and eliminated. We need to bear this in mind while working

on developing a probabilistic model for software reliability.

Software Size Growth ◾ 281

Failure models also use theory of product and ensure relevance. For exam-

ple, Varde [6] developed a log-normal model based on physics of failure involving

electromigration. Varde, ardently supporting physics based reasoning and appar-

ently reluctant to use of mindless statistical models, observed,

Nevertheless, statistics still forms the part of physics-of-failure

approach. This is because prediction of time to failure is still

modeled employing probability distribution. Traditionally log-

normal failure distribution has been used to estimate failure

time due to electromigration related failure.

Varde used median time to fail as the scale parameter and standard deviation as

the shape parameter, exactly as in NIST guidelines.

We have studied failure times of software after release, the data made avail-

able by the Cyber Security and Information Systems Information Analysis Center

CSIAC [7]. CSIAC is a Department of Defense (DoD) Information Analysis

Center (IAC) sponsored by the Defense Technical Information Center (DTIC). e

CSIAC is a consolidation of three predecessor IACs: the Data and Analysis Center

for Software (DACS), the Information Assurance Technology IAC (IATAC) and

the Modeling and Simulation IAC (MSIAC), with the addition of the Knowledge

Management and Information Sharing technical area.

e software reliability data set has 111 records of failure intervals. With

time, the failure intervals grow, increasing software reliability. We consider

time between failures (TBF) as the key indicator of a complex process involving

usage and maintenance. Growth of TBF is expected with a smooth log-normal

with a clear peak and a distinct tail (see Box 17.3 for an analogy for software

TBF).

However, the histogram of TBF, shown in Figure 17.9, reveals two peaks,

belonging to two separate clusters, suggesting two growth processes. It could be

that the second cluster could arise from a second release; it could also arise from a

new pattern of usage recently introduced.

We have ﬁtted two log-normal curves to the clusters. e ﬁrst has a scale of

15.5, Ln(median), and a shape of 0.8 (standard deviation of Ln(x)). e second has

a scale of 16.4 and a shape of 0.1. e graphs are shown in Figure 17.10. is is a

composite model.

e second log-normal curve in Figure 17.10 resembles Gaussian, but still we

prefer the log-normal equation because it is median based.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 17 - Software Size Growth: Log-Normal Distribution (3/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 17 - Software Size Growth: Log-Normal Distribution (3/4)