62 ◾ Simple Statistical Methods for Software Engineering
If results due to innovation show improvement, one or more of the following
visual clues may be present:
◾ e overall length of the box plot would have decreased
◾ Outliers might have disappeared
◾ e central line might show a favorable shift
◾ e box might have shrunk
◾ e box might be relocated in a favorable region
◾ e unfavorable whisker might have diminished
If an improvement is not visible in a box plot, it may not be an improvement in
the first place. e question of looking for significance does not arise.
However, in most cases, people take pains to go through lengthy procedures
to execute signicant tests to check dierences, without box plot visual
checks. In some cases even after box plot rejections, people go through the
ritual of signicance tests.
Holistic Test
e twin box plot test is a holistic approach; it can compare two populations (two
groups) in a complete balanced fashion that no other test can offer. e price we
pay for completeness is loss of rigor. It so happens that rigorous tests have narrower
scope than robust tests; approximate analysis can sweep more terrain than precise
analysis. We need such a holistic test before we go into more sophisticated tests.
e twin box plot shown in Figure 4.4 offers a holistic comparison described in
the following paragraphs.
First, it compares the median values. e median of the first estimate is 4.67%,
and the median of the second estimate is 1.27%. Comparing medians is more
robust than comparing means, which makes sense even with nonnormal data. is
is a comment on central tendency.
en dispersion is compared at two levels; the first IQR is 23.45 and the sec-
ond and improved value is 8.54. It is evident that the core of the estimation process
covering 50% of results shows less dispersion—an order of magnitude less. e
new dispersion is one-third the old. e old whisker-to-whisker range is 86.48,
whereas the new whisker-to-whisker range is 20.32, four times less. It is evident
that the dispersion is reduced in the new estimation technique; it is more reliable.
e box plot provides an order of magnitude test before we resort to p values for
judgment.
e box plot identifies outliers in the second group; not every estimate has been
well performed. e best practice must spread. e second process has philosophi-
cal problems called statistical outliers. However, in a practical sense, even the outli-
ers are better than the first process.