Chapter 13 - Grand Social Law: The Bell Curve (2/6)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

210 ◾ Simple Statistical Methods for Software Engineering

e mean is recalled from the central tendency of the process. e ability of the

Gaussian distribution to connect easily with approximate data makes it a “social

law.”

e Gaussian distribution allows us to see the two sides of truth: tendency

and dispersion and facilitates fair judgment.

For example, from remembered mean value of eﬀort variance = 5% and range =

30%, we can construct the Gaussian distribution shown in Figure 13.4. e central

tendency, pictorially seen, reveals the problem. If planning and estimation practices

were perfect, the central tendency would be zero. Nonzero tendency is a remark on

project management.

Box 13.2 is the Bell curve fair?

e power of the bell curve is linked to the central limit theorem (CLT): sam-

ple means tend to be normally distributed as sample size N tends to be large.

French mathematician Pierre-Simon Laplace rescued the CLT from the

nearly forgotten work of Abraham de Moivre and published it in his monu-

mental work éorie Analytique des Probabilités. In 1901, Russian mathema-

tician Aleksandr Lyapunov deﬁned it in general terms and proved precisely

how it worked mathematically [3].

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

–25 –20 –15 –10 –5 0 5 10 15 20 25

Eﬀort variance %

Figure 13.4 Gaussian distribution of effort variance.

Grand Social Law ◾ 211

Estimation Error

Variance metrics is a double-edged sword. On the one side, it measures how well a

plan is executed; on the other side, it measures how well a project is estimated. If

applied to estimation, this metric can be renamed as the percentage of estimation

error.

When processes mature, estimation errors tend to be the curve shown in Figure

13.5. Estimation errors are measurement errors; they resemble the astronomical mea-

surement errors used by Gauss when he discovered a path breaking application of nor-

mal distribution. is is true in the case of size estimation, schedule estimation, and

eﬀort estimation and even defect count estimation; errors in all these are Gaussian.

Sir Francis Galton described the CLT as follows [4]:

I know of scarcely anything so apt to impress the imagi-

nation as the wonderful form of cosmic order expressed

by the “Law of Frequency of Error.” Whenever a large

sample of chaotic elements are taken in hand and mar-

shaled in the order of their magnitude, an unsuspected

and most beautiful form of regularity proves to have been

latent all along.

e actual term central limit theorem was ﬁrst used by George Pólya in

1920 in the title of a paper. Pólya referred to the theorem as central because

of its importance in probability theory [5].

According to Le Cam, the French school of probability interprets the word

central in the sense that “it describes the behaviour of the centre of the distri-

bution as opposed to its tails” [6].

Between 1870 and 1913, Markov, Chebyshev, and Lyapunov contributed to

CLT. During 1920 to 1937, Lindeberg, Feller, and Lévy perfected the CLT [7].

CLT sets the context for a bell curve paradigm. e science of measure-

ments presents another truth. Whatever we measure, we make repetitions to

make measurements credible, and we measure the bell curve of the measured

parameter. e limit or peak of the bell curve is the truth. e tails denote

errors. Criticism of the bell curve as a grading curve (by some educationalists)

is ill founded. e bell curve represents data and cannot be made responsible

for hypothesis.

To sum it up, the bell curve represents truth better than isolated data.

212 ◾ Simple Statistical Methods for Software Engineering

Viewing Requirement Volatility

In the beginning of a project, managers do consider a risk of scope creep and plan

out strategies to handle risk. ere may not be objective evidence for potential

scope creep, but approximate models based on benchmark data can be used to con-

struct a Gaussian model to guide strategic planning. In certain projects, require-

ment volatility is believed to have a standard deviation of approximately 3.3% and

a mean value of 4%, as a rule of thumb.

umb rule is merely an expression of one’s experience.

With practice on statistical thinking, we can easily convert knowledge into

Gaussian parameters.

0.02

0.04

0.06

0.08

0.10

0.12

0.14

–25 –20 –15 –10 –5 0 5 10 15 20 25

Requirements volatility

reshold

Figure 13.6 Gaussian distribution of requirements volatility.

–11

–10

–9

–8

–7

–6

–5

–4

–3

–2

–1

0.02

Estimation error %

Estimation error model

Probability

F(x, µ, σ)

Mean = 0, SD = 3.2

0.04

0.06

0.08

0.10

0.12

0.14

F(x, µ, σ) = e

2πσ

(x–µ)

2σ

–

Figure 13.5 Gaussian distribution of estimation errors.

Grand Social Law ◾ 213

A plot of the Gaussian version of the above rule of thumb is shown in Figure

13.6. e tolerance limit is marked on the graph.

Figure 13.6 is a pictorial model to understand the challenge of requirement

volatility in the context of a constraining limit. It provides a great visualization of a

process along with process constraint (or goal).

Traditional ways to deal with information—reading, listening,

writing, talking—are painfully slow in comparison to “viewing

the big picture.” Those who survive information overload will

be those who search for information with broadband thinking

but apply it with a single-minded focus.

Kathryn Alesandrini

Survive Information Overload: The 7 Best Ways to Manage

Your Workload by Seeing the Big Picture

Risk Measurement

We can use the Gaussian curve to measure risk, and this is often carried out in soft-

ware project management. For example, in Figure 13.6, the tolerance limit marks

oﬀ a tail whose area indicates risk. In Figure 13.7, we show the Gaussian with the

tail area marked in black.

USL = 10%

Risk = 4.7%

Capability

95.3%

0.12

0.10

0.08

0.06

0.04

0.02

–6

–5.2

–4.4

–3.6

–2.8

–2

–1.2

–0.4

0.4

1.2

2.8

3.6

4.4

5.2

6.8

7.6

Requirements volatility %

8.4

9.2

10.8

11.6

12.4

13.2

14.8

15.6

16.4

17.2

Figure 13.7 Representing risk on Gaussian distribution of requirement volatility.

214 ◾ Simple Statistical Methods for Software Engineering

Without referring to the Gaussian tables, we can compute the tail area using

Excel function in the following expression:

Right tail = 1 − NORMDIST (USL, mean, standard deviation, 1)

Substituting our values, we get right tail = 0.04697 or 4.697%.

You can measure opportunity with the same yardstick

that measures the risk involved. They go together.

Earl Nightingale

e remaining area under the Gaussian measures the probability of meeting

the goal or “capability.” In our case, requirement volatility capability is 95.3%, as

marked in Figure 13.7.

Capability and risk are complementary. If one is absent the other steps in.

As an extension of the risk calculation procedure, we can calculate risks for tails

based on their distances from the mean. As an example, the tail areas are calculated

for a few useful values of distance from mean and given in Table 13.1.

Table 13.1 contains the solution to the one-tailed problem and presents the

probability of processes exceeding a given speciﬁcation limit. Several one-tailed

problems, such as the probability of defect density exceeding an upper limit, are the

probability of productivity falling below a lower speciﬁcation limit.

ere are several two-tailed problems. ese processes have both an upper

speciﬁcation limit and a lower speciﬁcation limit. For the eﬀort variance metric,

the speciﬁcation limits are ±20% in a certain enhancement project. e actual per-

formance is characterized by a normal distribution with mean = 14 and standard

deviation = 15. e two speciﬁcation limits deﬁne two tails.

e Excel syntax for the previous computation is as follows:

Left tail = NORMDIST (LSL, mean, standard deviation, 1)

Right tail = 1-NORMDIST (USL, mean, standard deviation, 1)

Total risk in the process = left tail + right tail

e calculations are shown in Data 13.1.

e left tail involves process compliance risk. When teams save, there is a risk of

adopting short cuts, which might later boomerang as product failure. e right tail has

a plain cost risk. e total risk in the project could be the sum of the two-tailed areas.

Sometimes, the two tails can attract diﬀerent weights, for a “weighted” sum calculation

of total risk. We have used a plain summation in Data 13.1 with the following result:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 13 - Grand Social Law: The Bell Curve (2/6)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 13 - Grand Social Law: The Bell Curve (2/6)