40 Simple Statistical Methods for Software Engineering
Table 3.1 Average Deviation and Average Absolute Deviation of Bug Repair Time
Bug Repair Time (days) Mean (days)
Deviation from
Mean (days)
Absolute Deviation from
Mean (days)
16 20.517 –4.517 4.517
23 20.517 2.483 2.483
45 20.517 24.483 24.483
20 20.517 –0.517 0.517
13 20.517 –7.517 7.517
13 20.517 –7.517 7.517
58 20.517 37.483 37.483
9 20.517 –11.517 11.517
7 20.517 –13.517 13.517
29 20.517 8.483 8.483
13 20.517 –7.517 7.517
12 20.517 –8.517 8.517
32 20.517 11.483 11.483
31 20.517 10.483 10.483
31 20.517 10.483 10.483
33 20.517 12.483 12.483
6 20.517 –14.517 14.517
31 20.517 10.483 10.483
26 20.517 5.483 5.483
21 20.517 0.483 0.483
31 20.517 10.483 10.483
19 20.517 –1.517 1.517
18 20.517 –2.517 2.517
18 20.517 –2.517 2.517
21 20.517 0.483 0.483
39 20.517 18.483 18.483
14 20.517 –6.517 6.517
11 20.517 –9.517 9.517
11 20.517 –9.517 9.517
9 20.517 –11.517 11.517
25 20.517 4.483 4.483
(Continued)
Data Dispersion 41
Table 3.1 (Continued) Average Deviation and Average Absolute Deviation of Bug
Repair Time
Bug Repair Time (days) Mean (days)
Deviation from
Mean (days)
Absolute Deviation from
Mean (days)
25 20.517 4.483 4.483
20 20.517 –0.517 0.517
17 20.517 –3.517 3.517
13 20.517 –7.517 7.517
13 20.517 –7.517 7.517
13 20.517 –7.517 7.517
24 20.517 3.483 3.483
12 20.517 –8.517 8.517
7 20.517 –13.517 13.517
7 20.517 –13.517 13.517
28 20.517 7.483 7.483
29 20.517 8.483 8.483
12 20.517 –8.517 8.517
49 20.517 28.483 28.483
20 20.517 –0.517 0.517
21 20.517 0.483 0.483
49 20.517 28.483 28.483
14 20.517 –6.517 6.517
15 20.517 –5.517 5.517
13 20.517 –7.517 7.517
6 20.517 –14.517 14.517
28 20.517 7.483 7.483
21 20.517 0.483 0.483
23 20.517 2.483 2.483
13 20.517 –7.517 7.517
16 20.517 –4.517 4.517
10 20.517 –10.517 10.517
14 20.517 –6.517 6.517
14 20.517 –6.517 6.517
Average Deviation 0.000
Average Absolute Deviation 8.669
42 Simple Statistical Methods for Software Engineering
deviation. A robust method is to consider median as the center. Absolute deviations
from the median are then computed. Next we take the average value of these abso-
lute deviations, that is, the median absolute deviation (MAD). e bug repair time
data MAD value is 6.000. e calculation is shown in Data 3.3.
Sum of Squares and Variance
ere is another way to avoid the sign problem. We can square the deviations and
take the average. In some statistical contexts, we register an intermediate stage of
computing the sum of squares. If two data sets have the same number of data
points, the sum of squares can be used to compare dispersion. If the number of data
points varies, we should take the average, known as variance.
For bug repair time, sum of squares and variation calculations are shown in Data 3.4.
Data 3.3 Absolute Deviation of Bug Repair Time from the Median
16 18.000 2.000
23 18.000 5.000
45 18.000 27.000
20 18.000 2.000
13 18.000 5.000
13 18.000 5.000
58 18.000 40.000
9 18.000 9.000
7 18.000 11.000
29 18.000 11.000
13 18.000 5.000
12 18.000 6.000
32 18.000 14.000
31 18.000 13.000
31 18.000 13.000
33 18.000 15.000
6 18.000 12.000
31 18.000 13.000
26 18.000 8.000
21 18.000 3.000
31 18.000 13.000
19 18.000 1.000
18 18.000 0.000
18 18.000 0.000
21 18.000 3.000
39 18.000 21.000
14 18.000 4.000
11 18.000 7.000
11 18.000 7.000
9 18.000 9.000
25 18.000 7.000
Bug Repair
Time Days
Median
Days
Absolute
Deviation
Days
Bug Repair
Time Days
Median
Days
Absolute
Deviation
Days
25 18.000 7.000
20 18.000 2.000
17 18.000 1.000
13 18.000 5.000
13 18.000 5.000
13 18.000 5.000
24 18.000 6.000
12 18.000 6.000
7 18.000 11.000
7 18.000 11.000
28 18.000 10.000
29 18.000 11.000
12 18.000 6.000
49 18.000 31.000
20 18.000 2.000
21 18.000 3.000
49 18.000 31.000
14 18.000 4.000
15 18.000 3.000
13 18.000 5.000
6 18.000 12.000
28 18.000 10.000
21 18.000 3.000
23 18.000 5.000
13 18.000 5.000
16 18.000 2.000
10 18.000 8.000
14 18.000 4.000
14 18.000 4.000
Median 6.000
Data Dispersion 43
e sum of squares is 7512.983. After deriving the average, the variance is found
to be 125.216. Variance is a good measure for comparing data sets. However, the
unit is days
2
, a squared entity. One cannot make an intuitive assessment of disper-
sion as we are able to do with average absolute deviation.
BOX 3.2 ICEBERG ANALOGY
Data are like an iceberg. e peak contains only 15% of ice. e remaining
85% is beneath the water level, unseen by the onlooker. e unseen ice details
could do great harm to ships. Likewise, the central values constitute just the
tip. e real process behavior is in the spread of data. e true behavior of a
process is understood when the spread is also recognized.
Data 3.4 Sum of Squares and Variance of Bug Repair Time
16 20.517 20.400
23 20.517 6.167
45 20.517 599.434
20 20.517 0.267
13 20.517 56.500
13 20.517 56.500
58 20.517 1405.000
9 20.517 132.634
7 20.517 182.700
29 20.517 71.967
13 20.517 56.500
12 20.517 72.534
32 20.517 131.867
31 20.517 109.900
31 20.517 109.900
33 20.517 155.834
6 20.517 210.734
31 20.517 109.900
26 20.517 30.067
21 20.517 0.234
31 20.517 109.900
19 20.517 2.300
18 20.517 6.334
18 20.517 6.334
21 20.517 0.234
39 20.517 341.634
14 20.517 42.467
11 20.517 90.567
11 20.517 90.567
9 20.517 132.634
25 20.517 20.100
25 20.517 20.100
20 20.517 0.267
17 20.517 12.367
13 20.517 56.500
13 20.517 56.500
13 20.517 56.500
24 20.517 12.134
12 20.517 72.534
7 20.517 182.700
7 20.517 182.700
28 20.517 56.000
29 20.517 71.967
12 20.517 72.534
49 20.517 811.300
20 20.517 0.267
21 20.517 0.234
49 20.517 811.300
14 20.517 42.467
15 20.517 30.434
13 20.517 56.500
6 20.517 210.734
28 20.517 56.000
21 20.517 0.234
23 20.517 6.167
13 20.517 56.500
16 20.517 20.400
10 20.517 110.600
14 20.517 42.467
14 20.517 42.467
Sum of Squares 7512.983
Variance 125.216
Bug Repair
Time Days
Mean
Days
Squared
Deviation
Days
Bug Repair
Time Days
Mean
Days
Squared
Deviation
Days
44 Simple Statistical Methods for Software Engineering
Standard Deviation
If we take the square root of the variance of bug x time, we will obtain 11.190
days. is is the standard deviation, SD, of bug repair time, the most commonly
used measure of dispersion. is is larger than the average absolute deviation.
e standard deviation is always larger than the average absolute deviation.
e exact formula for standard deviation, SD, has a small correction for sample
size. Instead of using n as the number of data points, the exact calculation uses n−1,
the degrees of freedom; that is,
SD =
=
( )x x
n
i
n
2
1
1
(3.3)
e corrected value of standard deviation for bug repair time is 11.284.
Process dispersion can be dened in terms of standard deviation, sigma. It is
a tradition dating back to the 1920s to take process variation as ±3 sigma. e
normal distribution beyond ±3 sigma is disregarded. Mathematically speaking, the
normal distribution runs from minus innity to plus innity. We trim the tails and
take the span from 3 sigma to +3 sigma as the process dispersion. e trimming
rules are associated with condence level. e ±3 sigma trimming rule is associated
with a condence level of 97.3%.
BOX 3.3 THUMB RULES
With experience, people develop thumb rules about using dispersion mea-
sures. Although the rules depend on the individual person, here is an educa-
tive example. e following table shows three ways of applying dispersion.
SNo Purpose Range Considered Confidence Level
1 Business decisions Interquartile 50%
2 Process decisions 3–97 percentile 94%
3 Risk avoidance Max–min 100%
Business decisions are customarily taken to accommodate IQR of varia-
tion. To accommodate more would need an unrealistic budget. Process deci-
sions are made with expectations of reasonably stringent discipline. Risk
avoidance involves understanding and accommodating extreme values. You
can form your own rules of thumb to manage dispersion.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.161.6