The Box plot, or Box and Whisker plot as it is popularly known, is a convenient statistical representation of the variation in a statistical population. It is a great way of showing a number of data points as well as showing the outliers and the central tendencies of data.
This visual representation of the distribution within a dataset was first introduced by American mathematician John W. Tukey in 1969. A box plot is significantly easier to plot than say a histogram and it does not require the user to make assumptions regarding the bin sizes and number of bins; and yet it gives significant insight into the distribution of the dataset.
The box plot primarily consists of four parts:
The median provides the central tendency of our dataset. It is the value that divides our dataset into two parts, values that are either higher or lower than the median. The position of the median within the box indicates the skewness in the data as it shifts either towards the upper or lower quartile.
The upper and lower quartiles, which form the box, represent the degree of dispersion or spread of the data between them. The difference between the upper and lower quartile is called the Interquartile Range (IQR) and it indicates the mid-spread within which 50 percentage of the points in our dataset lie.
The upper and lower whiskers in a box plot can either be plotted at the maximum and minimum value in the dataset, or 1.5 times the IQR on the upper and lower side. Plotting the whiskers at the maximum and minimum values includes 100 percentage of all values in the dataset including all the outliers. Whereas plotting the whiskers at 1.5 times the IQR on the upper and lower side represents outliers in the data beyond the whiskers.
The points lying between the lower whisker and the lower quartile are the lower 25 percent of values in the dataset, whereas the points lying between the upper whisker and the upper quartile are the upper 25 percent of values in the dataset.
In a typical normal distribution, each part of the box plot will be equally spaced. However, in most cases, the box plot will quickly show the underlying variations and trends in data and allows for easy comparison between datasets:
Create a Box and Whisker plot in a new sheet in our existing workbook.
For this purpose, we will connect to an Excel file named Data for Box plot & Gantt chart
, which has been uploaded on https://1drv.ms/f/s!Av5QCoyLTBpnhkGyrRrZQWPHWpcY.
Let us save this Excel file in our Documents
| My Tableau Repository
| Datasources
| Tableau Cookbook
data folder.
The data contains information about customers in terms of their gender and recorded weight. The data contains 100 records, one record per customer. Using this data, let us look at how we can create a Box and Whisker plot.
Once we have downloaded and saved the data from the link provided in the Getting started… section, we will create a new worksheet in our existing workbook and rename it to Box and Whisker plot.
Documents
| My Tableau Repository
| Datasources
| Tableau Cookbook
data folder. Box and Whisker plot data
by double-clicking on it.In the preceding chart, we get two box and whisker plots: one for each gender. The whiskers are the maximum and minimum extent of the data. Further more, in each category we can see some circles, which are essentially representing a customer. Thus, within each gender category, the graph is showing the distribution of customers by their respective weights. When we hover over any of these circles, we can see details of the customer in terms of name, gender, and recorded weight in the tooltip. Refer to the following image:
However, when we hover over the box (gray section), we will see the details in terms of median, lower quartiles, upper quartiles, and so on. Refer to the following image:
Thus, a summary of the box plot that we created is as follows:
In more simple terms, for the female category, the majority of the population lies between the weight range of 44 to 75, whereas for the male category, the majority of the population lies between the weight range of 44 to 82.
18.224.51.145