Chapter 13 - Grand Social Law: The Bell Curve (1/6)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

205

Chapter 13

Grand Social Law:

The Bell Curve

Most of us have been initiated into statistical thinking through normal distribu-

tion, with its well-known bell-shaped curve. e normal distribution was invented

from the binomial distribution.

e binomial distribution is discrete, the normal distribution is continuous.

de Moivre invented normal distribution in 1756. It is also called the Gaussian dis-

tribution because Gauss was the ﬁrst to apply this equation (1809). Popularly, this

distribution is known simply as the bell curve (see Box 13.1 for a brief history).

is is widely used in science, engineering, economics, management, and a host of

disciplines.

e basic form of the normal distribution, known as the standard normal curve,

is deﬁned in Equation 13.1, and the graph is shown in Figure 13.1.

y e

−

(13.1)

e distribution peaks at the mean, is symmetric, and spreads from –∞ to +∞.

e equation for normal distribution is shown in Equation 13.2. It is deﬁned

by two parameters, mean μ and standard deviation σ. e mean is known as the

location parameter because it controls the location of the distribution. e standard

deviation is known as the scale parameter because it controls the scale (width) of

206 ◾ Simple Statistical Methods for Software Engineering

Box 13.1 origins of a social law

Normal distribution has cast its inﬂuence in almost every ﬁeld of life and

research. It has gained the status of a social law.

French-born British mathematician Abraham de Moivre (1667–1754)

published A Doctrine of Chance: A Method of Calculating the Probabilities of

Events in Play in 1718, wherein he addressed the gambling problem. e third

edition appeared in 1756; it contained the approximation to the binomial

distribution by the normal distribution.

de Moivre actually had written the equation down in

1708; obtained it as a limit of coins tossing or binomial

distribution. We think of a coin being tossed ‘n’ times,

and note the proportion of k heads. After many k-fold tri-

als, we obtain a graph showing the number of occasions

on which we get 0 heads, 1 head, 2heads,… n heads.

The curve will peak around the probability of getting

heads with the coin. As the number of tosses ‘n’ grows

without a bound, a normal distribution results [1].

de Moivre’s concern was with games of chance, and his discovery showed

the power of sampling to determine patterns in a population by examining only

a few members. He spent the last part of his life by solving problems of chance

for gamblers as the resident statistician of Slaughter’s Coﬀee House in London.

In 1809, German mathematician and astronomer Johann Carl Friedrich

Gauss (1777–1855) showed that errors of measurement made in astronomi-

cal observations followed a symmetric distribution called normal distribu-

tion. Gauss was also the ﬁrst to develop the utility of the normal distribution

curve, which had been discovered earlier by de Moivre. is distribution is

now often called Gaussian.

The curve was developed by observational astronomers

who used the ideas of normal distribution to verify the

accuracy of measurements. They measured a distance

many times and graphed the results. If most measure-

ments clustered around the mean, then the average of the

results could be considered reliable. Outliers or deviant

measurements could be discounted as inaccurate [2].

Grand Social Law ◾ 207

the distribution. ere is no separate shape parameter because the shape is ﬁxed: it

is a bell shape.

F x e

( , , )

( )

µ σ

πσ

−

(13.2)

where μ is the mean (location parameter), and σ is the standard deviation (scale

parameter).

Mean and standard deviation are part of descriptive statistics, discussed in

Chapter 1. For any data set, we can estimate these two parameters. e equation is

a natural sequel.

e normal distribution has been studied under various names for nearly

300 years. To the historically inclined, it is Laplace’s second law, Gaussian

law, or Laplace–Gaussian curve. e names law of deviation and error curve

could make more sense to experimenters. Pearson, Fisher, and Galton have

called it the normal curve, the name greatly favored by statisticians.

Today, in statistics books, we tend to call this the normal distribution. In

the world of science, the favored name is Gaussian distribution.

–5 –4 –3 –2 –1 0 1 2 3 4 5

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

y =

–

√2π

Figure 13.1 Standard normal curve.

208 ◾ Simple Statistical Methods for Software Engineering

e statistical properties of this distribution are as follows:

Mean =

Mode = μ

Median = μ

Kurtosis = 3

Relative kurtosis = 0

Skew = 0

Variance = σ

Standard deviation = σ

Range = −∞ to +∞

e mean code productivity in LOC per person-day and its standard deviation

can be easily calculated from data and the corresponding normal distribution graph

can be plotted.

In Figure 13.2, the assumed normal distribution of productivity is plotted for

four diﬀerent standard deviations. We have to assume normal distribution because

productivity data would be seen as nonnormal had we plotted a histogram. However,

we proceed with normal approximation. If dispersion decreases, it is a good sign; it

indicates that the process becomes better. Figure 13.2 shows that as the standard

deviation decreases, the height of the curve increases while its width decreases.

Real-world process improvement consists of reduction in variation and a simul-

taneous favorable shift in the mean. Figure 13.3 shows the bell curves for produc-

tivity improvement.

0.05

0.10

0.15

0.20

0.25

10 20 30 40

Productivity LOC/person day

Mean = 40

Probability

F(x, µ, σ)

F(x, µ, σ) = e

2πσ

(x–µ)

2σ

–

60 70 80

SD = 2

SD = 5

SD = 7

SD = 9

Figure 13.2 Gaussian probability density function (PDF) of productivity.

Grand Social Law ◾ 209

e best performance is where the mean is 49 and the standard deviation 2.

is gets closer to the oft spoken about rule of thumb of 50 LOC per person-day.

e curves are still hypothetical, at best approximate. e bell curves in Figure 13.3

portray a story of improvement captured from a Gaussian lens.

First-Order Approximation of Variation

If that enabled us to predict the succeeding situation with the

same approximation, that is all we require, and we should say

that the phenomenon had been predicted, that it is governed

by the laws.

Henri Poincare

Building a Gaussian is rather easy, from just two parameters, mean and stan-

dard deviation. ese two can be obtained by expert judgment as well if data were

not accessible. If we can guess optimistic and pessimistic values, we can “estimate”

the Gaussian mean and standard deviation. e diﬀerence between the maximum

and the minimum values is the estimated range. e rule of thumb we use to ﬁnd

standard deviation is given as follows:

Standard deviation

Range

(13.3)

0.05

0.10

0.15

0.20

0.25

10 20 30 40

Productivity LOC/person day

Probability

F(x, µ, σ)

50 60 70 80

Mean = 49, SD = 2

Mean = 46, SD = 5

Mean = 43, SD = 7

Mean = 40, SD = 9

F(x, µ, σ) = e

2πσ

(x–µ)

2σ

–

Figure 13.3 Gaussian model for productivity improvement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 13 - Grand Social Law: The Bell Curve (1/6)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 13 - Grand Social Law: The Bell Curve (1/6)