285
Chapter 18
Gamma Distribution:
Making Use of
Minimal Data
Gamma distribution is a more general version of the exponential distribution. It
provides all the advantages of the exponential distribution: it can model arrival
times, and it has a fat tail and can characterize failure data. Gamma distribution
has the extra advantage: it provides us a prominent mode and gives us the freedom
to set the mode wherever we want by adjusting the shape factor. Gamma distri-
bution retains the fat tail of the exponential distribution. is is not surprising
because gamma distribution can be proven as a sum of exponential distributions.
e gamma distribution has two parameters, shape parameter α and scale
parameter β. e probability density function (PDF) is given by the following:
G x x e
x
( )
( )
,= >
1
0
1
β α
α β
α
α
β
Γ
(18.1)
where α is the shape parameter, β is the scale parameter, and Γ(α) = (α 1)! (gamma
function).
In the previously mentioned PDF, the symbol Γ(α) stands for the gamma func-
tion. e PDF is plotted in Figure 18.1 to show how the shape of the distribution
changes when we change the value of shape parameter from 1.2 to 2 and 3 in the
plots. e scale parameter is kept constant at 10.
286 Simple Statistical Methods for Software Engineering
e plots have been made using the Excel function GAMMADIST. is func-
tion returns both the PDF and the Cumulative Distribution Function CDF.
e Excel syntax is defined as follows:
PDF(x) = GAMMA.DIST (x, shape parameter, scale parameter, 0)
CDF(x) = GAMMA.DIST (x, shape parameter, scale parameter, 1)
Box 18.1 Similarity Between Gamma and
loG-normal: makinG the ChoiCe
Gamma and log-normal distributions look alike.
Kundu and Manglick [1] compared gamma and log-normal distribution
and found them remarkably similar. ey have used Lawless [2] data of bear-
ing failure for this study. Let us develop some ideas around this analysis.
For Lawless data, we obtain gamma shape = 3.7138 and scale = 19.4489,
relating to Equation 18.1. Using Excel GAMMA.DIST (x, shape, scale, 0),
the gamma curve can be realized.
For the same data, we can obtain logarithms and nd log-normal parame-
ters, mean of natural logarithms, and standard deviation of natural logarithms
of data. e log-normal curve can be realized by using Excel LOGNORM.
DIST (x, mean of Ln, standard deviation of Ln, o).
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 20 40 60 80 100
Shape = 1.2
Shape = 2
Shape = 3
1
β
α
Г(α)
x
α–1
e
x
β
Figure 18.1 Gamma distribution.
Gamma Distribution 287
Both the curves are shown in Figure 18.2.
e curves look similar. Kundu and Manglick have used the maximum
likelihood estimation (MLE) technique to obtain parameters, and they obtain
slightly different values but nearly identical distributions. (e Kolmogorov–
Smirnov (K-S) distance between the fitted empirical distribution function
and the fitted log-normal distribution function is 0.09, and the K-S distance
between the fitted empirical distribution function and the fitted gamma dis-
tribution function is 0.12.)
ese distributions are close to one another, and the log-normal is nearer
to empirical data based on the K-S distance analysis.
e similarity is superficial. ere is a difference in the approach and
assumptions in constructing both the distributions.
Hence, we face the question, which distribution should be used? Are there
preferences?
e gamma distribution may be used while taking shape based deci-
sions by expert judgment of shapes and mean values.
Log-normal distribution may be used for more rigorous numeri-
cal treatment based on parameter extraction from data alone.
Gamma distribution has a definite advantage: it can quickly convert visual
judgment to a mathematical model.
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0 50 100 150 200 250
Life of bearing
Log-normal
Gamma
Figure 18.2 Comparison of gamma and log-normal distributions of bear-
ing life.
288 Simple Statistical Methods for Software Engineering
Gamma Curves for Clarification Time Data
We can model clarification time data with gamma distribution. In software main-
tenance, clarification time depends mostly on the customer and is not under the
direct influence of the project team. Let us consider data with descriptive statistics
shown in Data 18.1.
Data 18.1 shows that the mean is 36.1622, the mode is 11, and the standard
deviation is 39.3173.
We wish to mention two properties of gamma distribution,
Mean scale shape= ×
= αβ
(18.2)
Variance = αβ
2
(18.3)
To select the shape parameter, let us consult the histogram of clarification time
data, as shown in Figure 18.3.
e shape of the histogram is closer to the rst curve in Figure 18.1, with a
shape factor of 1.2.
Substituting the values of mean (36.1622) and shape (1.2) in Equation 18.1, we
obtain the value of scale as follows:
Scale
Mean
Shape
= = =
36 1622
1 2
30 1352
.
.
.
Data 18.1 Descriptive
Statistics of Clarification Time
Data
Clarification Days
Mean 36.16227
Standard error 4.451811
Median 21.30242
Mode 11
Standard deviation 39.31733
Sample variance 1545.852
Kurtosis 2.071553
Skewness 1.533407
Range 171.3028
Minimum −7.30285
Maximum 164
Sum 2820.657
Count 78
Gamma Distribution 289
A gamma distribution is fitted to the data with a shape of 1.2 and a scale of 30.1352.
A plot of the fitted gamma PDF is shown in Figure 18.4.
is is the model for clarification time in software maintenance. From the
model, one can make several judgments, including the following:
e PDF ends practically at 150. erefore, the data cluster beyond 150 repre-
sents extreme values or outliers. A root cause analysis must be conducted for this
excessive delay, and corrective and preventive action must be initiated.
0.025
0.02
0.015
0.01
0.005
0
0 20 40 60 80
Clarification time (days)
Probability
100 120 140 160
Figure 18.4 Gamma distribution of clarification time with a shape of 1.2 and a
scale of 30.1352.
25
20
15
10
5
0
0 20 40 60 80 100
Clarification time (days)
Frequency
120 140 160 180 200
Figure 18.3 Histogram of clarification time.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.147.137