Creating a Box and Whisker plot

The Box plot, or Box and Whisker plot as it is popularly known, is a convenient statistical representation of the variation in a statistical population. It is a great way of showing a number of data points as well as showing the outliers and the central tendencies of data.

This visual representation of the distribution within a dataset was first introduced by American mathematician John W. Tukey in 1969. A box plot is significantly easier to plot than say a histogram and it does not require the user to make assumptions regarding the bin sizes and number of bins; and yet it gives significant insight into the distribution of the dataset.

The box plot primarily consists of four parts:

The median provides the central tendency of our dataset. It is the value that divides our dataset into two parts, values that are either higher or lower than the median. The position of the median within the box indicates the skewness in the data as it shifts either towards the upper or lower quartile.

The upper and lower quartiles, which form the box, represent the degree of dispersion or spread of the data between them. The difference between the upper and lower quartile is called the Interquartile Range (IQR) and it indicates the mid-spread within which 50 percentage of the points in our dataset lie.

The upper and lower whiskers in a box plot can either be plotted at the maximum and minimum value in the dataset, or 1.5 times the IQR on the upper and lower side. Plotting the whiskers at the maximum and minimum values includes 100 percentage of all values in the dataset including all the outliers. Whereas plotting the whiskers at 1.5 times the IQR on the upper and lower side represents outliers in the data beyond the whiskers.

The points lying between the lower whisker and the lower quartile are the lower 25 percent of values in the dataset, whereas the points lying between the upper whisker and the upper quartile are the upper 25 percent of values in the dataset.

In a typical normal distribution, each part of the box plot will be equally spaced. However, in most cases, the box plot will quickly show the underlying variations and trends in data and allows for easy comparison between datasets:

Creating a Box and Whisker plot

Getting ready

Create a Box and Whisker plot in a new sheet in our existing workbook.

For this purpose, we will connect to an Excel file named Data for Box plot & Gantt chart, which has been uploaded on https://1drv.ms/f/s!Av5QCoyLTBpnhkGyrRrZQWPHWpcY.

Let us save this Excel file in our Documents | My Tableau Repository | Datasources | Tableau Cookbook data folder.

The data contains information about customers in terms of their gender and recorded weight. The data contains 100 records, one record per customer. Using this data, let us look at how we can create a Box and Whisker plot.

How to do it…

Once we have downloaded and saved the data from the link provided in the Getting started… section, we will create a new worksheet in our existing workbook and rename it to Box and Whisker plot.

  1. Since we haven't connected to the new dataset yet, establish a new data connection by pressing Ctrl + D on our keyboard.
  2. Select the Excel option and connect to the Data for Box plot & Gantt chart file, which is saved in our Documents | My Tableau Repository | Datasources | Tableau Cookbook data folder.
  3. Next let us select the table named Box and Whisker plot data by double-clicking on it.
  4. Let us go ahead with the Live option to connect to this data.
  5. Next let us multi-select the Customer and Gender field from the Dimensions pane and the Weight from the Measures pane by doing a Ctrl + Select. Refer to the following image:
    How to do it…
  6. Next let us click on the Show Me! button and select the box-and-whisker plot. Refer to the highlighted section in the following image:
    How to do it…
  7. Once we click on the box-and-whisker plot option, we will see the following view:
    How to do it…

How it works…

In the preceding chart, we get two box and whisker plots: one for each gender. The whiskers are the maximum and minimum extent of the data. Further more, in each category we can see some circles, which are essentially representing a customer. Thus, within each gender category, the graph is showing the distribution of customers by their respective weights. When we hover over any of these circles, we can see details of the customer in terms of name, gender, and recorded weight in the tooltip. Refer to the following image:

How it works…

However, when we hover over the box (gray section), we will see the details in terms of median, lower quartiles, upper quartiles, and so on. Refer to the following image:

How it works…

Thus, a summary of the box plot that we created is as follows:

How it works…

In more simple terms, for the female category, the majority of the population lies between the weight range of 44 to 75, whereas for the male category, the majority of the population lies between the weight range of 44 to 82.

Tip

Please note that in our visualization, even though the Row shelf displays SUM(Weight), since we have Customer in the Detail shelf, there's only one entry per customer, so SUM(Weight) is actually the same as MIN(Weight), MAX(Weight), or AVG(Weight).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.130.199