Chapter 12 - Law of Rare Events (2/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

186 ◾ Simple Statistical Methods for Software Engineering

until a given date by dividing the defects found until now by the total number of

estimated defects, as shown in Figure 12.4. e cumulative defects found represent

software reliability.

ere is a caveat.

e rule says failure interval follows exponential distribution while defect

events follow Poisson distribution. e essential truth is both follow the

exponential law; one in continuous form, the other in discrete form.

In an ideal situation, we should use failure interval or time to fail in Equation 12.3

and plot a graph (that would resemble the same pattern in Figure 12.4). In real-life

projects, the exact time of defect discovery is not always available. People accumulate

information and submit reports on a weekly basis, occasionally on a daily basis, never

on an hourly basis, unless of course if the bug tracking tool has a provision to capture

defect events precisely in real time. Hence, we move away philosophically from report-

ing defect counts to reporting a metric called defects per week. Some people use defect

density (defects per KLOC or defects per FP) instead of defect count. Either way, we

have a density metric, which would still ﬁt into a model represented in Figure 12.4.

0.000

0.200

0.400

0.600

0.800

1.000

1.200

5 10

System test day

Cumulative defects discovered (reliability)

15 20

Figure 12.4 Exponential distribution cumulative distribution function (CDF) of

cumulative defects found.

Law of Rare Events ◾ 187

Poisson Distribution

e exponential law for discrete events can be expressed as follows:

P x

( , )

−

(12.5)

where x takes discrete integer values 0, 1, 2 …, and λ is the mean value of x.

e Poisson distribution can be solved in Excel using the statistical function

POISSON.DIST. For given values of x and λ, the function returns Poisson prob-

ability. While entering data by making cumulative = 0, we get probability distribu-

tion function, and by making cumulative = 1, we get cumulative probability.

Box 12.3 Siméon DeniS poiSSon (1781–1840)

Siméon Denis Poisson was a French mathematician. His teachers Laplace and

Lagrange quickly saw his mathematical talents. ey became friends for life

with their extremely able young student, and they gave him strong support

in a variety of ways.

His paper on the theory of equations written in his third year was of such

quality that Poisson could graduate without taking the ﬁnal examination. He

was employed as a tutor and appointed deputy professor 2 years later in 1802.

In 1806, he became a full professor.

One of Poisson’s contributions was the development of equations to ana-

lyze random events, later dubbed the Poisson distribution. It describes the

probability that a random event will occur in a time or space interval under

the conditions that the probability of the event occurring is very small but the

number of trials is very large; hence, the event actually occurs a few times.

e fame of this distribution is often attributed to the following story.

Many soldiers in the Prussian Army died due to kicks from horses. To deter-

mine whether this was due to a random occurrence or the wrath of god,

the Czar commissioned a Russian mathematician to determine the statistical

signiﬁcance of the events. It was found that the data ﬁtted remarkably well to

a Poisson distribution. ere was an order in the data, and deaths were now

statistically predictable.

Poisson never tried experimental designs. He said,

Life is good for only two things, discovering mathematics

and teaching mathematics.

188 ◾ Simple Statistical Methods for Software Engineering

Plots of Poisson probabilities of Equation 12.5 for λ = 1, 2, 3, and 4 are plotted

in Figure 12.5.

As λ increases, the distribution shifts to the right and tends to turn symmetrical.

e corresponding cumulative probabilities are plotted in Figure 12.6. As λ

increases, the curve attains an S shape.

e Poisson distribution was created by Siméon-Denis Poisson. In 1837, Poisson’s

Sorbonne lectures on probability and decision theory were published. ey

1.2

1.0

0.8

0.6

0.4

Cumulative probability

0.2

0 2 4 6 8 10 12

λ = 1 λ = 2 λ = 3 λ = 4

Figure 12.6 Poisson cumulative distribution function (CDF).

0.45

0.40

0.35

0.30

0.25

Poisson probability

0.20

0.15

0.10

0.05

0 2 4 6 8 10 12

λ = 1 λ = 2 λ = 3 λ = 4

Figure 12.5 Poisson probability density function (PDF).

Law of Rare Events ◾ 189

contained the Poisson distribution, which predicts the pattern in which random

events of very low probability occur in the course of a very large number of trials.

Poisson distribution is called the law of rare events.

A biographical note on the inventor of this distribution, Poisson, may be seen in

Box 12.3. Poisson seems to have touched upon a universal law. Poisson distribution

and its extensions are actively pursued by researchers in many domains, including

software engineering.

A Historic Poisson Analysis: Deaths of Prussian Cavalrymen

In the historic data analysis done by von Bortkiewicz in 1898, deaths of Prussian cav-

alrymen due to horse kicks were ﬁtted to a Poisson distribution. We can look at the

data made available in Statistics: e Poisson Distribution [2], where the mean value of

death per corps is given as p = 0.5434. Substituting this value in Equation 12.5 and

treating x as the number of deaths, we can construct a Poisson distribution as follows:

P x

( , . )

0 5434

−

(12.6)

where x is number of deaths in a single corps.

Figure 12.7 shows a plot of this Poisson distribution.

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 1 2 3

Deaths

Poisson probability

4 5

Figure 12.7 Poisson distribution of Prussian cavalrymen deaths.

190 ◾ Simple Statistical Methods for Software Engineering

Using the above Poisson distribution, Russian mathematician von Bortkiewicz

predicted that “over the 200 years observed 109 years would be with zero deaths.”

It turned out that 109 is exactly the number of years in which the Prussian data

recorded no deaths from horse kicks. e match between expected and actual val-

ues is not merely good, it is perfect.

Analysis of Module Defects Based on Poisson Distribution

Before release, software defects are triggered by tests according to the Poisson dis-

tribution. Defect count in modules in User Acceptance Tests will be an example

of rare events. If the average defects per module are 0.3 and if there are 100 mod-

ules in a release, the defects are distributed across the modules according to the

Poisson distribution. All the modules are not likely to have equal defects. A few

may have more and the count tapers oﬀ among the remaining. e distribution

follows Equation 12.5. e plot of Poisson distribution is shown in Figure 12.8.

e mean of the distribution is now known as the rate parameter. e only

parameter to the equation is the mean. Variance of the distribution is equal to

mean. Hence, the statistical limits are known by simple formulas:

UCL = +λ λ3 (12.7)

Box 12.4 AnAlogy—BAD AppleS

A truck delivering apples unloads at a warehouse. Most cartons have apples in

good condition, but some apples are damaged. Typically, “damaged apples”

is a rare event; only cartons in some part of the truck might be damaged. e

occurrence of damaged apples is a Poisson process, the distribution of defects

happens in spatial domain. e number of bad apples in unit volume is a

Poisson parameter.

Likewise, a software product is shipped to the customer. When usage

begins, some part of the product is found to have defects. Such defects are rare

events. Across the code structure, defects are spatially distributed. However,

software usage and defect discovery is a rare event in temporal domain.

Hence, people use the word defect arrival rate. e number of defects arriv-

ing in unit time (e.g., a week) can be measured from defects counts in time.

e defect arrival rate follows Poisson distribution.

Tests prior to release also discover defects in a similar manner. Defects “arrive”

according to the Poisson distribution, in a broad sense. Change requests follow

suit. Each development project has unique styles of managing defect discovery;

accordingly, the Poisson distribution varies in structure and departs from the

simple classic Poisson equation. ere are several variants of the Poisson distribu-

tion to accommodate the diﬀerent styles in defect management.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12 - Law of Rare Events (2/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 12 - Law of Rare Events (2/5)