Tukey’s Box Plot 63
Application Summary: Twin Box Plot
e twin box plot is a qualitative test and should be performed before any
hypothesis test.
e only way to compare overall performance of data sets is the twin box plot.
We can compare the following aspects of process using twin box plots:
Quartile-to-quartile distance (IQR)
Whisker-to-whisker distance
Range
Median (central tendency)
Outliers
Skew
Each comparison can provide a unique clue about difference in processes.
We can use a rule of thumb: if boxes overlap, there is no significant shift in
central value.
After seeing the twin box plot, we can decide which confirmatory test must
be performed.
If there is shift in central value, conrm it with a t test.
If dispersion is different, confirm it with an F test.
If outliers are present, cross check them with a control chart.
We can take preliminary decisions with the box plot, followed by confirma-
tory tests to make the final decision.
When data are nonnormal, twin box plots provide more reliable clues than
conventional tests.
Box 4.2 Evaluating improvEmEnt
e need to evaluate that improvement occurs more often than we think in
software projects. In the very rst place, we collect data because we wish to
improve performance. We are thus made to check if performance has really
improved after data collection and reporting. To do this, we need two sets
of performance data, before and after improvement. en we just have to
prepare a twin box plot and compare the results, as described in this chapter.
ere are other circumstances when we consciously improve performance
through six sigma and lean; once again, we can use box plots to compare
results before and after improvement. Sometimes we may do special experi-
ments and invariably use the box plot to portray data using box plots as evi-
dence for improvement. Box plots are widely used as graphical companion
to experiments. High maturity in software engineering involves continual
improvement, and the box plot is a very valuable tool.
64 Simple Statistical Methods for Software Engineering
Case Study 1: Business Perspectives
from CSAT Box Plots
is case study is about managing CSAT across a large organization with four stra-
tegic business units (SBUs). e annual average CSAT index of the organization
has been computed as 3.217, which is far below the target of 4 in a Likert scale of
1–5. e CSAT data are obtained by a survey of the overall satisfaction of custom-
ers. e calculation of the average of ordinal data is a subject of ongoing debate.
Strictly speaking, average is meaningless in ordinal data, but average is taken as an
effective indicator. It is easier to estimate and report. If we use box plots instead of
mean, we tend to see more details of CSAT. If we plot separate box plots for dif-
ferent SBUs, we get more information and an easy intercomparison, as shown in
Figure 4.5.
CSAT Analysis across SBUs
In one glance, we are able to take in several details of CSAT. e linear structure of
box plot accommodates several box plots in one chart. In Figure 4.5, there are lower
whiskers; the lower whiskers touch the floor level, particularly in SBU 1 and SBU 3.
ese lower whiskers are the real problems; customers tend to remember negative
results longer. If Kanos model of CSAT can be used, the lower whiskers fall in the
zone of asymptotically crashing dissatisfaction.
e key message of CSAT Box Plots is not in the central values but in the
lower whiskers.
5
4
3
Data
2
1
SBU 1 SBU 2 SBU 3 SBU 4
Box plot of SBU 1, SBU 2, SBU 3, SBU 4
*
Target, 4
(LSL)
Figure 4.5 Customer satisfaction analysis across SBUs using box plots.
Tukey’s Box Plot 65
e chart shows SBU 2 to be outstanding. e box reaches the maximum value,
providing customer delight. In SBU 3, customer delight is seen as a rare achieve-
ment and not a repeatable result. e chart is a sort of control chart on CSAT across
the organization. Target 4 can be interpreted as the lower specification limit; and
the chart provides a clear perspective of how CSAT performance meets target.
Case Study 2: Process Perspectives
from CSAT Box Plots
In another case study, we show how multiple dimensions of CSAT can be tracked
using box plots. e CSAT survey has captured customer responses to several other
dimensions of CSAT:
Engineering (ENGG)
Communication (COMM)
Time (TIME)
Price (PRICE)
Responsiveness (RESP)
Quality (QUAL)
ese selected six dimensions, or CSAT attributes, captured by the survey indi-
cate customer responses and provides opportunities for improvement to the soft-
ware development organization. e six box plots are available in a single chart as
in Figure 4.6.
5
4
3
Data
2
1
ENGG COMM TIME PRICE
Box plot of ENGG, COMM, TIME, PRICE, RESP, QUAL
RESP QUAL
***
Figure 4.6 Customer satisfaction analysis across attributes using box plots.
66 Simple Statistical Methods for Software Engineering
CSAT Analysis across Attributes
e chart provides a very easy comparison that we can quickly navigate through.
Performances in ENGG and TIME have earned the lowest scores. e lower
whisker in ENGG touches the floor and provides a red alert to the organization.
COMM and PRICE have earned the best scores, assuring customer delight from
within the box” area. If we take the target as 4, then RESP and QUAL still need
improvement. e box plots provided very useful information graphically.
Review Questions
1. What are the elements in a box plot?
2. What are fences?
3. How is the length of a whisker calculated?
4. How are the hinges calculated?
5. How robust are the rules used in detecting outliers in the box plot?
Exercises
1. Draw a box plot using the data provided in Data 4.1, using the macros pro-
vided in Refs. [4] and [5], and interpret the same. You can also download free
Excel box plot plotters from the web.
2. Draw a box plot of lines of code developed by yourself or various objects. See
how the box plot helps in statistical thinking.
Box 4.3 thE Box of thE Box plot
e box plot has a lean structure. It is remarkably simple and uncluttered. e
earliest version of the box plot was a straight line. Tukey added the box. e
box achieves its purpose by dominating the plot. is is an intended domi-
nation. e box has the median and contains 50% of central evidence. e
first glance makes us recognize the box and other details are subdued; the
box emerges as the primary message. is helps managers to grasp the essen-
tial behavior of processes sans the secondary details. Dispersion beyond the
box is considered secondary. Outliers are tertiary. Moreover, the box is plain
and unpopulated. It is just an outline drawing. For quick decisions regarding
budgeting, the box is all we need to consider. For systematic process manage-
ment, we consider the whiskers. For strategic risk management and problem
solving, we consider the outliers. e structure of the box plot helps with this
progression of management decisions.
Tukey’s Box Plot 67
3. Draw a box plot of defects based on the following module data. Interpret your
findings.
26
23
21
18
18
18
14
14
13
13
12
12
4. Compare the following two sets of rework efforts during testing using two
box plots. Interpret your graphs.
Set 1 Set 2
12 10
12 10
0 9
5 9
7 9
8 7
9 7
0 7
7
6
6
6
5
4
4
4
4
References
1. K. W. Haemer, Range-bar charts, American Statistician, 2(2), 23, 1948.
2. M. E. Spear, Charting Statistics, McGraw-Hill Book Company, Inc., New York, 1952.
3. K. Potter, Methods for presenting statistical information: the box plot, in InVisualization
of Large and Un-structured Data Sets, GI-Edition Lecture Notes in Informatics (LNI),
H. Hagens, A. Kerren and P. Dannenmann (eds.), vol. S-4, pp. 97–106, 2006.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.193.221