324 Simple Statistical Methods for Software Engineering
Gumbel Maximum: Complexity Analysis
Data maxima follow the Gumbel maximum PDF defined as follows:
F x e e
x
e
x
( ) =
1
β
µ
β
µ
β
(20.3)
where μ is the location parameter and β is the scale parameter.
Let us consider the case of extremely large complexity in some modules. e
specification limit on cyclomatic complexity is 70. Higher values are dubbed out-
liers and examined one by one. We wish to use these outliers collectively as a group
by constructing an exclusive PDF for these outliers. We separate these data from
the database and for a special group of outliers and obtain the following statistics:
Data mean = 219.2
Data mode = 169
us, the model parameters, obtained by applying moments equations, are as
follows:
Location = 169
Scale = 87
Using these parameters, the PDF is constructed and shown in Figure 20.3.
It shows the distribution of complexity maxima in software development. is
model can be used to manage technical risks in development projects.
Gumbel minimum
Location
Scale
3
1
0.4
0.35
0.3
0.25
0.2
0.15
0.1
Probability
Customer satisfaction index (0–10 scale)
Analysis of minimum scores
0.05
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
Figure 20.2 Gumbel minimum of CSAT minimum scores.
Gumbel Distribution for Extreme Values 325
The biggest problem we now have with the whole evolution of
the risk is the fat-tailed problem, which is really creating very large
conceptual difficulties. Because as we all know, the assumption
of normality enables us to drop off the huge amount of com-
plexity in our equation. Because once you start putting in non-
normality assumptions, which is unfortunately what characterizes
the real world, then these issues become extremely difficult.
Alan Greenspan (1997)
Gumbel extreme value PDF solves this problem and allows us to see risk
directly and objectively instead of inadequate expressions from conventional
statistical analysis.
Conventional models produce a good fit in regions where most of the data fall,
potentially at the expense of the tails. In extreme value analysis, only the tail data
are used.
Minima Maxima Comparisons
We proceed to compare Gumbel distributions for minima and maxima, given the
same location and scale parameters. is comparison allows us to gain an insight
into modeling. A comparison is illustrated in Figure 20.4.
We have kept the location parameter at 5 and scale parameter at 3 and con-
structed the Gumbel PDFs for minima and maxima using Equations 20.1 and 20.2.
Gumbel maximum
Location
Scale
169
87
0.00006
0.00005
0.00004
0.00003
0.00002
0.00001
Probability
0
0 100 200 300 400 500 600
Figure 20.3 Gumbel maximum of extreme values of cyclomatic complexity.
326 Simple Statistical Methods for Software Engineering
Box 20.4 hoW to chooSE thE rIGht
ExtrEmE VALuE DIStrIButIoN
ere are three types of extreme value distributions. e most common is
type I, the Gumbel distribution, which is unbounded and falls off exponen-
tially or faster. Type II, the Fréchet distribution, has a lower bound and falls
off slowly according to power law and has a fat tail. is is used to model
maximum values. Type III, the reversed Weibull distribution, has an upper
bound and is used to model minimum values.
Type I (Gumbel)
G x e x
e
x b
a
( ) = −∞ < <
(20.4)
Type II (Fréchet)
G x e x b
G x x b
x b
a
( )
( )
= >
=
− α
0
(20.5)
0.14
0.12
0.1
0.08
0.06
G(x)
0.04
0.02
–0.02
0
0 5 10
x
Gumbel distribution location = 5 shape = 3
15 20
Minima
Maxima
Figure 20.4 Comparison of Gumbel minimum and maximum.
Gumbel Distribution for Extreme Values 327
Type III (Weibull)
G x e x b
G x x b
x b
a
( )
( )
= <
=
− α
1
(20.6)
Although the behavior of the three laws is completely different, they can
be combined into a single parameterization containing one parameter ξ that
controls the “heaviness” of the tail, called the shape parameter:
GEV G x e
x
( ) =
+
1
1
ξ
µ
σ
ξ
(20.7)
e location parameter μ determines where the distribution is concen-
trated. e scale parameter σ determines its width. e shape parameter ξ
determines the rate of tail decay (the larger ξ, the heavier the tail), with the
following:
ξ > 0 indicating the heavy-tailed (Fréchet) case.
ξ = 0 indicating the light-tailed (Gumbel) case.
ξ < 0 indicating the truncated distribution (Weibull) case.
e extreme value distributions have been differently adopted by different
users. Each type of distribution offers certain advantages over the others for
specific cases.
In earthquake modeling, Zimbidis et al. [4] preferred to use type III
extreme value distribution (Weibull). ey analyzed the annual maximum
magnitude of earthquakes in Greece during the period 1966–2005. e plot
of mean excess over a threshold indicates a very short tail, and researchers
have chosen Weibull accordingly.
In worst-case execution time analysis of real-time embedded systems, Lu
et al. [5] used the Gumbel distribution after selecting the data very care-
fully using special sampling techniques. eir predictions agree closely with
observed data.
However, in the probabilistic minimum interarrival time analysis of
embedded systems, Maxim et al. [6] found that the Weibull extreme value
distribution fits better.
During the analysis of wave data, Caires [7] found the general extreme
value model more suitable.
e choice depends on data.
328 Simple Statistical Methods for Software Engineering
Analyzing Extreme Problems
Instead of seeing problems as tails of some parent distribution, extreme value
analysis using Gumbel distributions allows us to look at the problem squarely in
the eye.
e Gumbel distribution can be used to analyze several extreme problems in
addition to the two we have discussed so far. For example, we can do a simple
schedule variance analysis by collecting data, as shown in Figure 20.5.
is shows the distribution of maximum values of schedule variances in a
development project. e PDF is built with a location parameter of 20 and a scale
parameter of 12.
Likewise, we can easily analyze extremely error prone modules, extremely costly
effort escalations, extreme volatility of requirements, and so on. ere is a great
opportunity for such modern and innovative analysis.
ere are a few cautions to be taken before we do extreme value analysis.
First, data collection needs care. Data must be drawn from samples that
are independent and identical (the iid criterion). Extreme values in a single
organization approximately meet this requirement of identicality, assum-
ing similar process run in all projects. Data samples also can be easily made
independent (one sample does not inuence another). Doing extreme value
analysis across distinctly dierent processes is not suggested.
Gumbel maximum
Location
Scale
20
12
0.003
0.0025
0.002
0.0015
0.001
0.0005
Probability
0
0 20 40 60 80 100
Figure 20.5 Gumbel maximum of extreme values of schedule variance.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.203.142