54 ◾ Simple Statistical Methods for Software Engineering
In its early form developed by Mary Spear in 1952, the box plot displayed the
five-point summary of data [2]:
Median
Lower quartile
Upper quartile
Smallest data value
Largest data value
A box is made of median and quartiles; the box includes 50% of observations.
e quartiles are the edges of the box called hinges. e whiskers are lines that begin
at the hinges and end at the smallest and largest data values. e graph is known as
the box-and-whisker plot, or simply the box plot.
e box plot has gone through several changes. A summary of the historical
developments is presented by Kristin [3].
A simple but effective improvement of the box plot came from John Wilder
Tukey, which made box plot a popular tool. Tukey modified the box plot and
published it in Exploratory Data Analysis [4] in 1977. In the modern version, data
fences are used. e whiskers do not stretch to the smallest or largest data values.
e whiskers stretch out from the box only up to trimming points (or fences) that
mark off outliers. e trimming rules have been empirically designed. e markers
are 1.5 interquartile range (IQR) away from the box. Whiskers end at the points
farthest from the box inside these markers. e markers provide a pragmatic way to
find outliers. Aczel and Sundara Pandian, authors of an Excel tool to plot the box
plot, refer to these markers as fences [5]. Besides these inner fences, the plot authors
have introduced additional markers 3 IQR away from the box. ese are referred to
as outer fences. If data lie beyond the inner fences, they can be suspected as possible
outliers. Data that fall outside the outer fences are definite outliers.
A typical box plot is shown in Figure 4.1. e following guidelines have been
used in the construction of the graph:
Box central line = Median
Lower hinge (edge) of box = Quartile 1
Upper hinge (edge) of box = Quartile 3
IQR = Quartile 3 − Quartile 1
Right inner fence = Quartile 3 + 1.5 IQR
Left inner fence = Quartile 1 − 1.5 IQR
Right outer fence = Quartile 3 + 3 IQR
Left outer fence = Quartile 1 − 3 IQR
Software productivity data (lines of code/person day) are analyzed by this plot.
e box is constructed from Quartile 1 (productivity = 8) to Quartile 3 (productivity =
34.5). Fifty percent of the data are inside the box. Hence, the core productivity is in