205
Chapter 13
Grand Social Law:
The Bell Curve
Most of us have been initiated into statistical thinking through normal distribu-
tion, with its well-known bell-shaped curve. e normal distribution was invented
from the binomial distribution.
e binomial distribution is discrete, the normal distribution is continuous.
de Moivre invented normal distribution in 1756. It is also called the Gaussian dis-
tribution because Gauss was the first to apply this equation (1809). Popularly, this
distribution is known simply as the bell curve (see Box 13.1 for a brief history).
is is widely used in science, engineering, economics, management, and a host of
disciplines.
e basic form of the normal distribution, known as the standard normal curve,
is defined in Equation 13.1, and the graph is shown in Figure 13.1.
y e
z
=
1
2
2
2
π
(13.1)
e distribution peaks at the mean, is symmetric, and spreads from –∞ to +∞.
e equation for normal distribution is shown in Equation 13.2. It is defined
by two parameters, mean μ and standard deviation σ. e mean is known as the
location parameter because it controls the location of the distribution. e standard
deviation is known as the scale parameter because it controls the scale (width) of
206 Simple Statistical Methods for Software Engineering
Box 13.1 origins of a social law
Normal distribution has cast its influence in almost every field of life and
research. It has gained the status of a social law.
French-born British mathematician Abraham de Moivre (1667–1754)
published A Doctrine of Chance: A Method of Calculating the Probabilities of
Events in Play in 1718, wherein he addressed the gambling problem. e third
edition appeared in 1756; it contained the approximation to the binomial
distribution by the normal distribution.
de Moivre actually had written the equation down in
1708; obtained it as a limit of coins tossing or binomial
distribution. We think of a coin being tossed n’ times,
and note the proportion of k heads. After many k-fold tri-
als, we obtain a graph showing the number of occasions
on which we get 0 heads, 1 head, 2heads,n heads.
The curve will peak around the probability of getting
heads with the coin. As the number of tosses n’ grows
without a bound, a normal distribution results [1].
de Moivre’s concern was with games of chance, and his discovery showed
the power of sampling to determine patterns in a population by examining only
a few members. He spent the last part of his life by solving problems of chance
for gamblers as the resident statistician of Slaughters Coffee House in London.
In 1809, German mathematician and astronomer Johann Carl Friedrich
Gauss (1777–1855) showed that errors of measurement made in astronomi-
cal observations followed a symmetric distribution called normal distribu-
tion. Gauss was also the first to develop the utility of the normal distribution
curve, which had been discovered earlier by de Moivre. is distribution is
now often called Gaussian.
The curve was developed by observational astronomers
who used the ideas of normal distribution to verify the
accuracy of measurements. They measured a distance
many times and graphed the results. If most measure-
ments clustered around the mean, then the average of the
results could be considered reliable. Outliers or deviant
measurements could be discounted as inaccurate [2].
Grand Social Law 207
the distribution. ere is no separate shape parameter because the shape is fixed: it
is a bell shape.
F x e
x
( , , )
( )
µ σ
πσ
µ
σ
=
1
2
2
2
2
(13.2)
where μ is the mean (location parameter), and σ is the standard deviation (scale
parameter).
Mean and standard deviation are part of descriptive statistics, discussed in
Chapter 1. For any data set, we can estimate these two parameters. e equation is
a natural sequel.
e normal distribution has been studied under various names for nearly
300 years. To the historically inclined, it is Laplace’s second law, Gaussian
law, or LaplaceGaussian curve. e names law of deviation and error curve
could make more sense to experimenters. Pearson, Fisher, and Galton have
called it the normal curve, the name greatly favored by statisticians.
Today, in statistics books, we tend to call this the normal distribution. In
the world of science, the favored name is Gaussian distribution.
–5 –4 –3 –2 –1 0 1 2 3 4 5
0
0.05
0.10
0.15
0.20
y
0.25
0.30
0.35
0.40
0.45
z
y =
e
1
√2π
z
2
2
Figure 13.1 Standard normal curve.
208 Simple Statistical Methods for Software Engineering
e statistical properties of this distribution are as follows:
Mean =
μ
Mode = μ
Median = μ
Kurtosis = 3
Relative kurtosis = 0
Skew = 0
Variance = σ
2
Standard deviation = σ
2
Range = − to +
e mean code productivity in LOC per person-day and its standard deviation
can be easily calculated from data and the corresponding normal distribution graph
can be plotted.
In Figure 13.2, the assumed normal distribution of productivity is plotted for
four different standard deviations. We have to assume normal distribution because
productivity data would be seen as nonnormal had we plotted a histogram. However,
we proceed with normal approximation. If dispersion decreases, it is a good sign; it
indicates that the process becomes better. Figure 13.2 shows that as the standard
deviation decreases, the height of the curve increases while its width decreases.
Real-world process improvement consists of reduction in variation and a simul-
taneous favorable shift in the mean. Figure 13.3 shows the bell curves for produc-
tivity improvement.
0
0
0.05
0.10
0.15
0.20
0.25
10 20 30 40
Productivity LOC/person day
Mean = 40
Probability
F(x, µ, σ)
50
F(x, µ, σ) = e
1
2πσ
(x–µ)
2
2
60 70 80
SD = 2
SD = 5
SD = 7
SD = 9
Figure 13.2 Gaussian probability density function (PDF) of productivity.
Grand Social Law 209
e best performance is where the mean is 49 and the standard deviation 2.
is gets closer to the oft spoken about rule of thumb of 50 LOC per person-day.
e curves are still hypothetical, at best approximate. e bell curves in Figure 13.3
portray a story of improvement captured from a Gaussian lens.
First-Order Approximation of Variation
If that enabled us to predict the succeeding situation with the
same approximation, that is all we require, and we should say
that the phenomenon had been predicted, that it is governed
by the laws.
Henri Poincare
Building a Gaussian is rather easy, from just two parameters, mean and stan-
dard deviation. ese two can be obtained by expert judgment as well if data were
not accessible. If we can guess optimistic and pessimistic values, we can “estimate”
the Gaussian mean and standard deviation. e difference between the maximum
and the minimum values is the estimated range. e rule of thumb we use to find
standard deviation is given as follows:
Standard deviation
Range
=
6
(13.3)
0
0
0.05
0.10
0.15
0.20
0.25
10 20 30 40
Productivity LOC/person day
Probability
F(x, µ, σ)
50 60 70 80
Mean = 49, SD = 2
Mean = 46, SD = 5
Mean = 43, SD = 7
Mean = 40, SD = 9
F(x, µ, σ) = e
1
2πσ
(x–µ)
2
2
Figure 13.3 Gaussian model for productivity improvement.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.132.6