Making a box-and-whisker plot

Do you want to visualize a series of data measurement (or observations) to show several properties of the data series (such as the median value, the spread of the data, and the distribution of the data) in one plot? And would you want to do that in a way where you can visually compare several similar data series? How would you visualize them? Welcome to the box-and-whisker plot! Probably the best plot type for comparing distributions, if you are talking to people used to information density.

The box-and-whisker plot usage examples range from comparing test scores between schools to comparing process parameters before and after changes (optimization).

Getting ready

What are the elements of box-and-whisker plots? As we see in the following diagram, we have several important elements that carry information in the box-and-whisker plot. The first component is the box that carries information about the interquartile range going from lower to upper quartile values. The median value of the data is represented by a line across the box.

Getting ready

The whiskers extend from the box on both sides going from the first quartile (25 percentile) to the last quartile (75 percentile) of the data. In other words, the whiskers extend 1.5 times from the base of the inter-quartile range. In the case of a normal distribution, whiskers will cover 99.3% of the total data range.

If there are values outside the whiskers range, they will be displayed as fliers. Otherwise, the whiskers will cover the total range of the data.

Optionally, the box can also carry information about confidence intervals around the median. This is represented by a notch in the box. This information can be used to indicate whether the data in the two series is of the similar distribution. However, this is not rigorous and is just an indication that can be visually inspected.

How to do it...

In the following recipe, you will learn how to create a box-and-whisker plot using matplotlib.

We will perform the following steps:

  1. Sample some comparative process data, where a single integer number represents the occurrence of an error during the observed period of the running process.
  2. Read data from the PROCESSES dictionary into DATA.
  3. Read labels from the PROCESSES dictionary into LABELS.
  4. Render the box-and-whisker plot using matplotlib.pyplot.boxplot.
  5. Remove some chart junk from the figure.
  6. Add axes labels.
  7. Show the figure.

The following code implements these steps:

import matplotlib.pyplot as plt
# define data 
PROCESSES = { 
    "A": [12, 15, 23, 24, 30, 31, 33, 36, 50, 73], 
    "B": [6, 22, 26, 33, 35, 47, 54, 55, 62, 63], 
    "C": [2, 3, 6, 8, 13, 14, 19, 23, 60, 69], 
    "D": [1, 22, 36, 37, 45, 47, 48, 51, 52, 69], 
    } 

DATA = PROCESSES.values()
LABELS = PROCESSES.keys()

plt.boxplot(DATA, notch=False, widths=0.3) 

# set ticklabel to process name 
plt.gca().xaxis.set_ticklabels(LABELS) 

# some clean up(removing chartjunk) 
# turn the spine off
for spine in plt.gca().spines.values(): 
spine.set_visible(False) 

# turn all ticks for x-axis off 
plt.gca().xaxis.set_ticks_position('none')
# leave left ticks for y-axis on
plt.gca().yaxis.set_ticks_position('left')
# set axes labels 
plt.ylabel("Errors observed over defined period.") 
plt.xlabel("Process observed over defined period.") 

plt.show()

The preceding code generates the following figure:

How to do it...

How it works...

The box-and-whisker plot is rendered by first computing quartiles for the given data in DATA.

These quartile values are used to compute lines to draw boxes and whiskers.

We adjusted the plot removing all the unnecessary lines (referring to superfluous lines such as chart junk, as mentioned in the famous book, The Visual Display of Quantitative Information, by Edward R. Tufte). Those lines do not carry information and just put more pressure on the mental models in a viewer's brain to decode all the lines before discovering real valuable information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.232.239