Data Dispersion ◾ 49
4 are robust, without any inuence from extreme values. e true capability of the
bug repair process is indicated by estimates 2 and 4.
Application Contexts
e statistic “dispersion measure” is most sensitive to context. Measures of disper-
sion can be applied in three prominent contexts: process control, experiments, and
risk management.
Variation is unavoidable in software processes. In the manufacturing context,
variation is the least in machine-controlled processes. Manual processes of hardware
production have a few orders of magnitude more than variation. Software processes
have several orders of magnitude more than variation. Software processes rst have
human variation; next most software processes are of a problem-solving nature and
thus reect variation in the complexity of the problem. Hence, Shewhart’s common
and special cause variations do not completely represent software process variation.
In software processes, variation has subtler components, including genetic variation
of agents and entropy of the problem scenario. We would rather attempt to under-
stand variation before we classify variation in tune with the philosophy of Deming
[1], which propounded that understanding variation is part of profound knowl-
edge. Categorizing variation into types is divisive, whereas nding a numerical
expression for dispersion is integrative. e numerical expression, robust enough
to deal with nonnormal data, is MAD and can be used as a measure of process
performance in performance scorecards. For instance, in the cases of bug repair, the
following two values represent the process:
Median 18 days
MAD 6 days
If we study variation in experimental data, we will have a dierent context. In
experiments, variation is treated as error. Truth is in the center. e standard devia-
tion is a good measure to represent error. If the measured value is positive, we will
benet from using coecient of standard deviation. When we do an experiment to
measure productivity, we can express the experimental result as a mean ± % RSD
(relative standard deviation or coecient of standard deviation). For example, the
mean productivity of 120 LOC per day ±30% RSD could be a good expression of
experimental study.
Risk managers need a mathematical expression for variation. Of all the options,
the standard deviation is a close enough approximation that works well for risk
assessments.
e measures of dispersion given in this chapter provide a basic entry into the
subject. For a cohesive understanding, variation should be modeled by methods
given in Section II of this book.