186 Simple Statistical Methods for Software Engineering
until a given date by dividing the defects found until now by the total number of
estimated defects, as shown in Figure 12.4. e cumulative defects found represent
software reliability.
ere is a caveat.
e rule says failure interval follows exponential distribution while defect
events follow Poisson distribution. e essential truth is both follow the
exponential law; one in continuous form, the other in discrete form.
In an ideal situation, we should use failure interval or time to fail in Equation 12.3
and plot a graph (that would resemble the same pattern in Figure 12.4). In real-life
projects, the exact time of defect discovery is not always available. People accumulate
information and submit reports on a weekly basis, occasionally on a daily basis, never
on an hourly basis, unless of course if the bug tracking tool has a provision to capture
defect events precisely in real time. Hence, we move away philosophically from report-
ing defect counts to reporting a metric called defects per week. Some people use defect
density (defects per KLOC or defects per FP) instead of defect count. Either way, we
have a density metric, which would still t into a model represented in Figure 12.4.
0
0.000
0.200
0.400
0.600
0.800
1.000
1.200
5 10
System test day
Cumulative defects discovered (reliability)
15 20
Figure 12.4 Exponential distribution cumulative distribution function (CDF) of
cumulative defects found.
Law of Rare Events 187
Poisson Distribution
e exponential law for discrete events can be expressed as follows:
P x
e
x
x
( , )
!
λ
λ
λ
=
(12.5)
where x takes discrete integer values 0, 1, 2 , and λ is the mean value of x.
e Poisson distribution can be solved in Excel using the statistical function
POISSON.DIST. For given values of x and λ, the function returns Poisson prob-
ability. While entering data by making cumulative = 0, we get probability distribu-
tion function, and by making cumulative = 1, we get cumulative probability.
Box 12.3 Siméon DeniS poiSSon (1781–1840)
Siméon Denis Poisson was a French mathematician. His teachers Laplace and
Lagrange quickly saw his mathematical talents. ey became friends for life
with their extremely able young student, and they gave him strong support
in a variety of ways.
His paper on the theory of equations written in his third year was of such
quality that Poisson could graduate without taking the final examination. He
was employed as a tutor and appointed deputy professor 2 years later in 1802.
In 1806, he became a full professor.
One of Poissons contributions was the development of equations to ana-
lyze random events, later dubbed the Poisson distribution. It describes the
probability that a random event will occur in a time or space interval under
the conditions that the probability of the event occurring is very small but the
number of trials is very large; hence, the event actually occurs a few times.
e fame of this distribution is often attributed to the following story.
Many soldiers in the Prussian Army died due to kicks from horses. To deter-
mine whether this was due to a random occurrence or the wrath of god,
the Czar commissioned a Russian mathematician to determine the statistical
significance of the events. It was found that the data fitted remarkably well to
a Poisson distribution. ere was an order in the data, and deaths were now
statistically predictable.
Poisson never tried experimental designs. He said,
Life is good for only two things, discovering mathematics
and teaching mathematics.
188 Simple Statistical Methods for Software Engineering
Plots of Poisson probabilities of Equation 12.5 for λ = 1, 2, 3, and 4 are plotted
in Figure 12.5.
As λ increases, the distribution shifts to the right and tends to turn symmetrical.
e corresponding cumulative probabilities are plotted in Figure 12.6. As λ
increases, the curve attains an S shape.
e Poisson distribution was created by Siméon-Denis Poisson. In 1837, Poissons
Sorbonne lectures on probability and decision theory were published. ey
1.2
1.0
0.8
0.6
0.4
Cumulative probability
0.2
0
0 2 4 6 8 10 12
λ = 1 λ = 2 λ = 3 λ = 4
Figure 12.6 Poisson cumulative distribution function (CDF).
0.45
0.40
0.35
0.30
0.25
Poisson probability
0.20
0.15
0.10
0.05
0
0 2 4 6 8 10 12
λ = 1 λ = 2 λ = 3 λ = 4
Figure 12.5 Poisson probability density function (PDF).
Law of Rare Events 189
contained the Poisson distribution, which predicts the pattern in which random
events of very low probability occur in the course of a very large number of trials.
Poisson distribution is called the law of rare events.
A biographical note on the inventor of this distribution, Poisson, may be seen in
Box 12.3. Poisson seems to have touched upon a universal law. Poisson distribution
and its extensions are actively pursued by researchers in many domains, including
software engineering.
A Historic Poisson Analysis: Deaths of Prussian Cavalrymen
In the historic data analysis done by von Bortkiewicz in 1898, deaths of Prussian cav-
alrymen due to horse kicks were fitted to a Poisson distribution. We can look at the
data made available in Statistics: e Poisson Distribution [2], where the mean value of
death per corps is given as p = 0.5434. Substituting this value in Equation 12.5 and
treating x as the number of deaths, we can construct a Poisson distribution as follows:
P x
e
x
x
( , . )
.
!
.
0 5434
0 5434
0 5434
=
(12.6)
where x is number of deaths in a single corps.
Figure 12.7 shows a plot of this Poisson distribution.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3
Deaths
Poisson probability
4 5
Figure 12.7 Poisson distribution of Prussian cavalrymen deaths.
190 Simple Statistical Methods for Software Engineering
Using the above Poisson distribution, Russian mathematician von Bortkiewicz
predicted that “over the 200 years observed 109 years would be with zero deaths.
It turned out that 109 is exactly the number of years in which the Prussian data
recorded no deaths from horse kicks. e match between expected and actual val-
ues is not merely good, it is perfect.
Analysis of Module Defects Based on Poisson Distribution
Before release, software defects are triggered by tests according to the Poisson dis-
tribution. Defect count in modules in User Acceptance Tests will be an example
of rare events. If the average defects per module are 0.3 and if there are 100 mod-
ules in a release, the defects are distributed across the modules according to the
Poisson distribution. All the modules are not likely to have equal defects. A few
may have more and the count tapers off among the remaining. e distribution
follows Equation 12.5. e plot of Poisson distribution is shown in Figure 12.8.
e mean of the distribution is now known as the rate parameter. e only
parameter to the equation is the mean. Variance of the distribution is equal to
mean. Hence, the statistical limits are known by simple formulas:
UCL = +λ λ3 (12.7)
Box 12.4 AnAlogy—BAD AppleS
A truck delivering apples unloads at a warehouse. Most cartons have apples in
good condition, but some apples are damaged. Typically, damaged apples”
is a rare event; only cartons in some part of the truck might be damaged. e
occurrence of damaged apples is a Poisson process, the distribution of defects
happens in spatial domain. e number of bad apples in unit volume is a
Poisson parameter.
Likewise, a software product is shipped to the customer. When usage
begins, some part of the product is found to have defects. Such defects are rare
events. Across the code structure, defects are spatially distributed. However,
software usage and defect discovery is a rare event in temporal domain.
Hence, people use the word defect arrival rate. e number of defects arriv-
ing in unit time (e.g., a week) can be measured from defects counts in time.
e defect arrival rate follows Poisson distribution.
Tests prior to release also discover defects in a similar manner. Defects arrive
according to the Poisson distribution, in a broad sense. Change requests follow
suit. Each development project has unique styles of managing defect discovery;
accordingly, the Poisson distribution varies in structure and departs from the
simple classic Poisson equation. ere are several variants of the Poisson distribu-
tion to accommodate the different styles in defect management.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.168.192