Software Size Growth ◾ 273
It may be seen in Figure 17.5 that the curve with MOM parameters matches
the histogram, with a tail finishing at 15. e other curves either stop up front or
overshoot.
It may also be seen that we have made no attempt to make the log-normal model
represent the outliers seen in the box plot seen in Figure 17.1. Our emphasis has
remained on the body of the log-normal and not on its tail. at emphasis depends
on our strategy in building the body or the tail. Had we wished to emphasize the
tail, then we would have opted for attaching a Pareto tail. To pursue this idea, we
need to improve the precision of our judgment. We do so by evaluating errors in
prediction. For nine values of percentiles, ranging from 0.1 to 0.9 in steps of 0.1,
data values are computed first by using the percentile function in Excel. e same
percentiles are interpreted as probabilities in cumulative log-normal distribution,
and we predict design complexities corresponding to the nine percentiles by doing
an inverse calculation using LOGNORMINV in Excel. e difference between
predicted value and data value is taken as error in prediction. To find a meaningful
average error, we find absolute error each time, remove the sign, and then take the
average. Prediction errors are calculated for all the candidate log-normal models.
e calculations are shown in Table 17.1.
In Model A, the parameters directly obtained by MOM are used. In Model B,
the scale is estimated with reference to the median. In Model C, the scale is the
same as Model B, but the shape has been perturbed till mean absolute error (MAE)
converged to a minimum.
For further discussion, let us choose the optimized Model C having minimum
error. e optimized model has a scale of 0.693 and a shape of 0.930. We can
plot the log-normal cumulative distribution function (CDF) of Model C using the
Excel function, as follows:
CDF = LOGNORM.DIST(x, scale, shape, 1).
After substitution, the expression becomes
CDF = LOGNORM.DIST(x, 0.693, 0.930, 1).
For our reference, we can plot the probability density function (PDF) using the
Excel function, as follows:
PDF = LOGNORM.DIST(x, scale, shape, 0).
After substitution the expression becomes
PDF = LOGNORM.DIST(x, 0.693, 0.930, 0).
In both the cases, x is a design complexity. e plots are shown in Figure 17.6.