22 ◾ Simple Statistical Methods for Software Engineering
Once the true error is found out, the estimation can be calibrated as a measurement
process.
It is customary to take a sample data and consider the mean of the sample as
the true observation. It makes no statistical sense to judge based on a single obser-
vation. We need to think with “sample mean” and not with stray single points.
“Sample mean” is more reliable than any individual observation. “Sample mean”
dominates statistical analysis.
Uncertainty in Mean: Standard Error
e term “sample mean” must be seen with more care; it simply refers to the mean
of observed data. Say we collect data about effort variance from several releases
in a development project. ese data form a sample from which we can compute
the mean effort variance in the project. Individual effort variance data are used to
measure and control events; sample mean is used to measure and control central
capability. Central tendency is used to judge process capability.
Now the Software Engineering Process Group (SEPG) would be interested in
estimating process capability from an organizational perspective. ey can collect
sample means from several projects and construct a grand mean. We can call the
grand mean by another term, the population mean. Here population refers to the
collective experience of all projects in the organization. e population mean rep-
resents the true capability of organization.
If we go back to the usage of the term truth, we find there are several discoveries
of truth; each project discovers effort variance using sample mean. e organiza-
tion discovers truth from population mean.
Now we can estimate the population mean (the central tendency of the organi-
zational process) from the sample mean from one project (the central tendency of
the local process). We cannot pinpoint the population mean, but we can fix a band
of values where population mean may reside. ere is an uncertainty associated
with this estimation. It is customary to define this uncertainty by a statistic called
standard error. Let us look further into this concept.
It is known that the mean values gathered from different projects—the sample
means—vary according to the normal distribution. e theorem that propounds
this is known as the central limit theorem. e standard deviation of this normal
distribution is known as the standard error.
If we have just collected sample data from one project with n data points, and
with a standard deviation s, then we can estimate standard error with reasonable
accuracy using the relation