Comparison – box plots for groups of feature values

The previous charts described the relationship between days taken to finish reading a book and page count. Now, we will try to understand the highest page counts in calendar months, where a book was finished after x number of reading days. In the first chart, we plotted the number of samples we have for each group of books, which were completed in the same number of days. There are multiple samples in many groups. Here, we will plot a distribution for multiple samples in each group.

We can define a statistic to summarize the average page counts of books completed in the same calendar month as the a book was completed after x number of days.

We will make a box plot chart as follows:

Parameters to set box plot chart

The data that we are visualizing in this box plot is made using multiple group by operations, because the box plots shows median and other percentiles, look at the following chart: 

Box plot made by using multiple group by operations

It is a good idea to look at the transformed data and then interpret this chart. Let's use the Export to .csv format option available in the top right. It will download the CSV locally on the machine where you are accessing your Superset web app:

Export to CSV option available

Sort the CSV by days and _timestamp to see the values plotted in each of the boxes: 

Days and timestamp of the boxes
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.