Chapter 20 - Gumbel Distribution for Extreme Values (2/3)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

324 ◾ Simple Statistical Methods for Software Engineering

Gumbel Maximum: Complexity Analysis

Data maxima follow the Gumbel maximum PDF deﬁned as follows:

F x e e

( ) =

−

(20.3)

where μ is the location parameter and β is the scale parameter.

Let us consider the case of extremely large complexity in some modules. e

speciﬁcation limit on cyclomatic complexity is 70. Higher values are dubbed out-

liers and examined one by one. We wish to use these outliers collectively as a group

by constructing an exclusive PDF for these outliers. We separate these data from

the database and for a special group of outliers and obtain the following statistics:

Data mean = 219.2

Data mode = 169

us, the model parameters, obtained by applying moments equations, are as

follows:

Location = 169

Scale = 87

Using these parameters, the PDF is constructed and shown in Figure 20.3.

It shows the distribution of complexity maxima in software development. is

model can be used to manage technical risks in development projects.

Gumbel minimum

Location

Scale

0.4

0.35

0.3

0.25

0.2

0.15

0.1

Probability

Customer satisfaction index (0–10 scale)

Analysis of minimum scores

0.05

0.5

1.5

2.5

3.5

4.5

5.5

6.5

7.5

8.5

9.5

Figure 20.2 Gumbel minimum of CSAT minimum scores.

Gumbel Distribution for Extreme Values ◾ 325

The biggest problem we now have with the whole evolution of

the risk is the fat-tailed problem, which is really creating very large

conceptual difﬁculties. Because as we all know, the assumption

of normality enables us to drop off the huge amount of com-

plexity in our equation. Because once you start putting in non-

normality assumptions, which is unfortunately what characterizes

the real world, then these issues become extremely difﬁcult.

Alan Greenspan (1997)

Gumbel extreme value PDF solves this problem and allows us to see risk

directly and objectively instead of inadequate expressions from conventional

statistical analysis.

Conventional models produce a good ﬁt in regions where most of the data fall,

potentially at the expense of the tails. In extreme value analysis, only the tail data

are used.

Minima Maxima Comparisons

We proceed to compare Gumbel distributions for minima and maxima, given the

same location and scale parameters. is comparison allows us to gain an insight

into modeling. A comparison is illustrated in Figure 20.4.

We have kept the location parameter at 5 and scale parameter at 3 and con-

structed the Gumbel PDFs for minima and maxima using Equations 20.1 and 20.2.

Gumbel maximum

Location

Scale

169

0.00006

0.00005

0.00004

0.00003

0.00002

0.00001

Probability

0 100 200 300 400 500 600

Figure 20.3 Gumbel maximum of extreme values of cyclomatic complexity.

326 ◾ Simple Statistical Methods for Software Engineering

Box 20.4 hoW to chooSE thE rIGht

ExtrEmE VALuE DIStrIButIoN

ere are three types of extreme value distributions. e most common is

type I, the Gumbel distribution, which is unbounded and falls oﬀ exponen-

tially or faster. Type II, the Fréchet distribution, has a lower bound and falls

oﬀ slowly according to power law and has a fat tail. is is used to model

maximum values. Type III, the reversed Weibull distribution, has an upper

bound and is used to model minimum values.

Type I (Gumbel)

G x e x

x b

( ) = −∞ < < ∞

−

(20.4)

Type II (Fréchet)

G x e x b

G x x b

x b

( )

= >

= ≤

−













− α

(20.5)

0.14

0.12

0.1

0.08

0.06

G(x)

0.04

0.02

–0.02

0 5 10

Gumbel distribution location = 5 shape = 3

15 20

Minima

Maxima

Figure 20.4 Comparison of Gumbel minimum and maximum.

Gumbel Distribution for Extreme Values ◾ 327

Type III (Weibull)

G x e x b

G x x b

x b

( )

= <

= ≥

− −

−

























− α

(20.6)

Although the behavior of the three laws is completely diﬀerent, they can

be combined into a single parameterization containing one parameter ξ that

controls the “heaviness” of the tail, called the shape parameter:

GEV G x e

( ) =

− +

−

























−

(20.7)

e location parameter μ determines where the distribution is concen-

trated. e scale parameter σ determines its width. e shape parameter ξ

determines the rate of tail decay (the larger ξ, the heavier the tail), with the

following:

ξ > 0 indicating the heavy-tailed (Fréchet) case.

ξ = 0 indicating the light-tailed (Gumbel) case.

ξ < 0 indicating the truncated distribution (Weibull) case.

e extreme value distributions have been diﬀerently adopted by diﬀerent

users. Each type of distribution oﬀers certain advantages over the others for

speciﬁc cases.

In earthquake modeling, Zimbidis et al. [4] preferred to use type III

extreme value distribution (Weibull). ey analyzed the annual maximum

magnitude of earthquakes in Greece during the period 1966–2005. e plot

of mean excess over a threshold indicates a very short tail, and researchers

have chosen Weibull accordingly.

In worst-case execution time analysis of real-time embedded systems, Lu

et al. [5] used the Gumbel distribution after selecting the data very care-

fully using special sampling techniques. eir predictions agree closely with

observed data.

However, in the probabilistic minimum interarrival time analysis of

embedded systems, Maxim et al. [6] found that the Weibull extreme value

distribution ﬁts better.

During the analysis of wave data, Caires [7] found the general extreme

value model more suitable.

e choice depends on data.

328 ◾ Simple Statistical Methods for Software Engineering

Analyzing Extreme Problems

Instead of seeing problems as tails of some parent distribution, extreme value

analysis using Gumbel distributions allows us to look at the problem squarely in

the eye.

e Gumbel distribution can be used to analyze several extreme problems in

addition to the two we have discussed so far. For example, we can do a simple

schedule variance analysis by collecting data, as shown in Figure 20.5.

is shows the distribution of maximum values of schedule variances in a

development project. e PDF is built with a location parameter of 20 and a scale

parameter of 12.

Likewise, we can easily analyze extremely error prone modules, extremely costly

eﬀort escalations, extreme volatility of requirements, and so on. ere is a great

opportunity for such modern and innovative analysis.

ere are a few cautions to be taken before we do extreme value analysis.

First, data collection needs care. Data must be drawn from samples that

are independent and identical (the iid criterion). Extreme values in a single

organization approximately meet this requirement of identicality, assum-

ing similar process run in all projects. Data samples also can be easily made

independent (one sample does not inuence another). Doing extreme value

analysis across distinctly dierent processes is not suggested.

Gumbel maximum

Location

Scale

0.003

0.0025

0.002

0.0015

0.001

0.0005

Probability

0 20 40 60 80 100

Figure 20.5 Gumbel maximum of extreme values of schedule variance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 20 - Gumbel Distribution for Extreme Values (2/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 20 - Gumbel Distribution for Extreme Values (2/3)