319
Chapter 20
Gumbel Distribution
for Extreme Values
A Science of Outliers
Convention has it that outliers must be marked, studied, and analyzed for root
causes. In process management, outliers represent high cost, poor quality, and
rework. e temptation seems to be to attach a stigma to outliers and build prob-
ability density functions (PDFs) for the remaining data. A scientific way would be
to treat outliers statistically and even predict their occurrence. ese outliers can be
called extreme values and be subjected to treatment by the science of extreme value
theory, invented by Fréchet (see Box 20.1). e behavior of extremes can be modeled
by extreme value distributions.
Cláudia Neves et al. [1] summarized the characteristics of extreme value distri-
butions as follows:
A distribution function that belongs to the Fréchet domain
of attraction is called a heavy-tailed distribution, the Weibull
domain encloses light-tailed distributions with nite right
endpoint and the particularly interesting case of the Gumbel
domain embraces a great variety of tail distribution functions
ranging from light to moderately heavy, whether detaining
finite right endpoint or not.
320 Simple Statistical Methods for Software Engineering
Of the three types of extreme value distributions, the more popular one is the
Gumbel distribution (see Box 20.4). ere are different notations corresponding
to the application of the Gumbel distribution. We follow the notation used in the
NIST Handbook, where this is known as type I extreme value distribution [2].
e presence of extremes in process data may be seen in the box plot presenta-
tion of data (see Chapter 4). Beyond the threshold called fences, we can see extreme
values on either end of typical box plots. On the right, we have extremes known
as “maxima,and on the left, we have extremes known as minima.Both the
extremes can have a significant effect on the process. Gumbel distributions can be
used to model both the maxima and the minima.
Box 20.1 FIVE PEoPLE AND ExtrEmE VALuE thEory
Five people have contributed to extreme value theory. Fréchet proposed an
extreme value distribution in 1927. Fisher and Tippet refined it in 1928 and
proposed three types of extreme value distributions. In 1948, Gnedenko for-
mulated the FisherTippett–Gnedenko theory (generalized extreme value
theory). Gumbel worked on type I extreme value distribution (called the
Gumbel distribution after him) and provided simpler derivation and proof
in 1958.
Fréchet—Maurice René Fréchet (1878–1973), a French mathematician
who made several important contributions to the field of statistics and
probability.
Fisher—Ronald Aylmer Fisher (1890–1962), an English statistician who
created the foundations for modern statistical science.
GumbelEmil Julius Gumbel (1891–1966), a German mathematician
and political writer who derived and analyzed the probability distribu-
tion that is now known as the Gumbel distribution in his honor.
Tippett—Leonard Henry Caleb Tippett (1902–1985), an English statisti-
cian who pioneered extreme value theory along with R. A. Fisher and
Emil Gumbel.
GnedenkoBoris Vladimirovich Gnedenko (1912–1995), a Soviet math-
ematician who is a leading member of the Russian school of probability
theory and statistics.
Gumbel Distribution for Extreme Values 321
Gumbel Minimum PDF
Extreme minimum values follow the Gumbel distribution defined as follows:
F x e e
x
e
x
( ) =
1
β
µ
β
µ
β
(20.1)
where μ is the location parameter and β is the scale parameter.
We have two plots of the Gumbel PDF in Figure 20.1, with a common location
parameter (5) and two scale parameters (2 and 3).
It may be noted that these curves show a sharp decline in the right because they
represent limits of minimal values.
Box 20.2 WorLD rEcorD: 100-mEtEr SPrINt
In a research of world records for the 100-meter running from 1991 to 2008,
extreme value theory has been applied [3] to predict the ultimate world record.
Researchers predict that the best possible time that could be achieved in the
near future is 9.51 seconds for men and 10.33 seconds for women. World
records during the study are 9.69 and 10.49 seconds for men and women,
respectively.
ey used a generalized extreme value distribution, as follows:
G x e x
x
γ γ
γ
γ
( )
( )
= +
+1
1
1 0for (20.2)
where γ is the extreme value index.
0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8
x
Gumbel
Probability
9 10 11 12 13 14 15
Loc 5 scale 2
Loc 5 scale 3
1
β
x–µ
β
e e
–e
x–µ
β
Figure 20.1 Gumbel minimum.
322 Simple Statistical Methods for Software Engineering
Gumbel Parameter ExtractionA Simple Approach
We can use the moments method to extract the two Gumbel parameters, location
and scale. Let us consider the equations relating data mean, median and mode, and
standard deviation to Gumbel parameters, shown as follows (http://en.wikipedia
.org/wiki/Gumbel_distribution):
Data mode = μ
Data mean = μ + 0.5772β
Data median = μ βln(ln(2))
Data SD /= βπ 6
Solving the previously mentioned equations will yield Gumbel parameters.
Box 20.3 GumBEL DIStrIButIoN,
tIPPEtt, AND cottoN thrEAD
e evolution of the Gumbel distribution is associated with the story of cot-
ton thread failure in the textile industry.
Leonard Henry Caleb Tippett, after graduating from Imperial College in
1923, was awarded a studentship by the British Cotton Industry Research
Association (the Shirley Institute) to study statistics under Professor Karl
Pearson. Later, he also worked with the great Sir Ronald Fisher.
As they studied the ultimate world records, they were interested in the
right end point of the distribution. e end point is finite if γ < 0 and infinite
if γ > 0. Moreover, it may be seen that in case of γ < 0, γ = 0, or γ > 0, the G
γ
reduces to Weibull, Gumbel, or Fréchet distribution function, respectively.
It turns out that researchers have used the reversed Weibull form of extreme
value distribution.
To build the model, researchers collected the fastest personal best times.
us, each athlete only appeared once on their list. e sample size is 762 for
men and 479 for women. e estimates of γ are 0.18 for women and 0.19
for men.
e prediction is sensitive to the data window. If records up to 2005 were
used, researchers nd, the predictions of ultimate sprint records would be
9.29 and 10.11 seconds.
Gumbel Distribution for Extreme Values 323
Gumbel Minimum: Analyzing Low CSAT Scores
Customer satisfaction (CSAT) scores were traditionally measured in a Likert
scale ranging from 1 to 5. Recently, eort is being made to measure CSAT on a
0 to 10 continuous scale. e latter scale allows detailed analysis. In both scales,
the problem area in CSAT lies in the minimum values, which correspond to
deep dissatisfaction. e minimum values on a 0 to 10 scale follow the Gumbel
distribution.
is analysis is very different from the typical control charts many plot on
mean CSAT scores. e mean values are too neutral to reveal customer dissatisfac-
tion. Preparing to plot Gumbel PDF means we collect minimum values of CSAT.
is by itself is a paradigm shift in CSAT measurement.
We nd the mode of the gathered minimum values and use it as the loca-
tion parameter of the PDF. e scale parameter is approximately equal to 1,
applying the appropriate moment equation. us, the model parameters are as
follows:
Location = 3
Scale = 1
e Gumbel minimum PDF is constructed with these parameters and is shown in
Figure 20.2.
e Gumbel PDF of CSAT is an eloquent problem statement. All the low-
valued outliers in CSAT data are represented in this plot.
He spent the next 40 years working at the Shirley Institute. He put sta-
tistics to work in a variety of industrial problems, such as the problem that
looms in weaving sheds that were idle approximately 30% of the time, the
problem of yarn breakage rates in weaving, the problem of the relationship
between the length of a test specimen of a yarn and its strength, and the prob-
lem of thickness variation along the length of a yarn. He conducted factorial
experiments on yarn.
e strength of the yarn is in the weakest part. is was seen by Tippett
as an “extreme” situation. He studied the occurrence of extremes and iden-
tified three forms of extremes. While working with Fisher, he created the
distributions, known as the Fisher–Tippet distributions. Later, Gumbel took
up a special case represented by one of the three equations, simplified it, and
created the Gumbel distribution.
Tippett was a role model for industrial statisticians. As a result of his work
in the textile industry, he was awarded the Shewhart Medal of the American
Society for Quality Control.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.59.168