210 Simple Statistical Methods for Software Engineering
e mean is recalled from the central tendency of the process. e ability of the
Gaussian distribution to connect easily with approximate data makes it a social
law.
e Gaussian distribution allows us to see the two sides of truth: tendency
and dispersion and facilitates fair judgment.
For example, from remembered mean value of effort variance = 5% and range =
30%, we can construct the Gaussian distribution shown in Figure 13.4. e central
tendency, pictorially seen, reveals the problem. If planning and estimation practices
were perfect, the central tendency would be zero. Nonzero tendency is a remark on
project management.
Box 13.2 is the Bell curve fair?
e power of the bell curve is linked to the central limit theorem (CLT): sam-
ple means tend to be normally distributed as sample size N tends to be large.
French mathematician Pierre-Simon Laplace rescued the CLT from the
nearly forgotten work of Abraham de Moivre and published it in his monu-
mental work éorie Analytique des Probabilités. In 1901, Russian mathema-
tician Aleksandr Lyapunov defined it in general terms and proved precisely
how it worked mathematically [3].
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
–25 –20 –15 –10 –5 0 5 10 15 20 25
Effort variance %
Figure 13.4 Gaussian distribution of effort variance.
Grand Social Law 211
Estimation Error
Variance metrics is a double-edged sword. On the one side, it measures how well a
plan is executed; on the other side, it measures how well a project is estimated. If
applied to estimation, this metric can be renamed as the percentage of estimation
error.
When processes mature, estimation errors tend to be the curve shown in Figure
13.5. Estimation errors are measurement errors; they resemble the astronomical mea-
surement errors used by Gauss when he discovered a path breaking application of nor-
mal distribution. is is true in the case of size estimation, schedule estimation, and
effort estimation and even defect count estimation; errors in all these are Gaussian.
Sir Francis Galton described the CLT as follows [4]:
I know of scarcely anything so apt to impress the imagi-
nation as the wonderful form of cosmic order expressed
by the “Law of Frequency of Error.Whenever a large
sample of chaotic elements are taken in hand and mar-
shaled in the order of their magnitude, an unsuspected
and most beautiful form of regularity proves to have been
latent all along.
e actual term central limit theorem was rst used by George Pólya in
1920 in the title of a paper. Pólya referred to the theorem as central because
of its importance in probability theory [5].
According to Le Cam, the French school of probability interprets the word
central in the sense that “it describes the behaviour of the centre of the distri-
bution as opposed to its tails” [6].
Between 1870 and 1913, Markov, Chebyshev, and Lyapunov contributed to
CLT. During 1920 to 1937, Lindeberg, Feller, and Lévy perfected the CLT [7].
CLT sets the context for a bell curve paradigm. e science of measure-
ments presents another truth. Whatever we measure, we make repetitions to
make measurements credible, and we measure the bell curve of the measured
parameter. e limit or peak of the bell curve is the truth. e tails denote
errors. Criticism of the bell curve as a grading curve (by some educationalists)
is ill founded. e bell curve represents data and cannot be made responsible
for hypothesis.
To sum it up, the bell curve represents truth better than isolated data.
212 Simple Statistical Methods for Software Engineering
Viewing Requirement Volatility
In the beginning of a project, managers do consider a risk of scope creep and plan
out strategies to handle risk. ere may not be objective evidence for potential
scope creep, but approximate models based on benchmark data can be used to con-
struct a Gaussian model to guide strategic planning. In certain projects, require-
ment volatility is believed to have a standard deviation of approximately 3.3% and
a mean value of 4%, as a rule of thumb.
umb rule is merely an expression of one’s experience.
With practice on statistical thinking, we can easily convert knowledge into
Gaussian parameters.
0
0.02
0.04
0.06
0.08
0.10
0.12
0.14
–25 –20 –15 –10 –5 0 5 10 15 20 25
Requirements volatility
reshold
Figure 13.6 Gaussian distribution of requirements volatility.
–11
–10
–9
–8
–7
–6
–5
–4
–3
–2
–1
0
1
2
3
4
5
6
7
8
9
10
11
0
0.02
Estimation error %
Estimation error model
Probability
F(x, µ, σ)
Mean = 0, SD = 3.2
0.04
0.06
0.08
0.10
0.12
0.14
F(x, µ, σ) = e
1
2πσ
(x–µ)
2
2
Figure 13.5 Gaussian distribution of estimation errors.
Grand Social Law 213
A plot of the Gaussian version of the above rule of thumb is shown in Figure
13.6. e tolerance limit is marked on the graph.
Figure 13.6 is a pictorial model to understand the challenge of requirement
volatility in the context of a constraining limit. It provides a great visualization of a
process along with process constraint (or goal).
Traditional ways to deal with information—reading, listening,
writing, talking—are painfully slow in comparison to “viewing
the big picture.Those who survive information overload will
be those who search for information with broadband thinking
but apply it with a single-minded focus.
Kathryn Alesandrini
Survive Information Overload: The 7 Best Ways to Manage
Your Workload by Seeing the Big Picture
Risk Measurement
We can use the Gaussian curve to measure risk, and this is often carried out in soft-
ware project management. For example, in Figure 13.6, the tolerance limit marks
off a tail whose area indicates risk. In Figure 13.7, we show the Gaussian with the
tail area marked in black.
USL = 10%
Risk = 4.7%
Capability
95.3%
0.12
0.10
0.08
0.06
0.04
0.02
0
–6
–5.2
–4.4
–3.6
–2.8
–2
–1.2
–0.4
0.4
1.2
2
2.8
3.6
4.4
5.2
6
6.8
7.6
Requirements volatility %
8.4
9.2
10
10.8
11.6
12.4
13.2
14
14.8
15.6
16.4
17.2
18
Figure 13.7 Representing risk on Gaussian distribution of requirement volatility.
214 Simple Statistical Methods for Software Engineering
Without referring to the Gaussian tables, we can compute the tail area using
Excel function in the following expression:
Right tail = 1 − NORMDIST (USL, mean, standard deviation, 1)
Substituting our values, we get right tail = 0.04697 or 4.697%.
You can measure opportunity with the same yardstick
that measures the risk involved. They go together.
Earl Nightingale
e remaining area under the Gaussian measures the probability of meeting
the goal or “capability.In our case, requirement volatility capability is 95.3%, as
marked in Figure 13.7.
Capability and risk are complementary. If one is absent the other steps in.
As an extension of the risk calculation procedure, we can calculate risks for tails
based on their distances from the mean. As an example, the tail areas are calculated
for a few useful values of distance from mean and given in Table 13.1.
Table 13.1 contains the solution to the one-tailed problem and presents the
probability of processes exceeding a given specification limit. Several one-tailed
problems, such as the probability of defect density exceeding an upper limit, are the
probability of productivity falling below a lower specification limit.
ere are several two-tailed problems. ese processes have both an upper
specification limit and a lower specification limit. For the effort variance metric,
the specification limits are ±20% in a certain enhancement project. e actual per-
formance is characterized by a normal distribution with mean = 14 and standard
deviation = 15. e two specification limits define two tails.
e Excel syntax for the previous computation is as follows:
Left tail = NORMDIST (LSL, mean, standard deviation, 1)
Right tail = 1-NORMDIST (USL, mean, standard deviation, 1)
Total risk in the process = left tail + right tail
e calculations are shown in Data 13.1.
e left tail involves process compliance risk. When teams save, there is a risk of
adopting short cuts, which might later boomerang as product failure. e right tail has
a plain cost risk. e total risk in the project could be the sum of the two-tailed areas.
Sometimes, the two tails can attract different weights, for a “weighted” sum calculation
of total risk. We have used a plain summation in Data 13.1 with the following result:
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.200.150