There's more...

It is possible to replicate much more complex pivot tables with groupby aggregations. For instance, take the following result from pivot_table:

>>> flights.pivot_table(index=['AIRLINE', 'MONTH'],
                        columns=['ORG_AIR', 'CANCELLED'],
                        values=['DEP_DELAY', 'DIST'],
                        aggfunc=[np.sum, np.mean],
                        fill_value=0)

To replicate this with a groupby aggregation, simply follow the same pattern from the recipe and place all the columns from the index and columns parameters into the groupby method and then unstack the columns:

>>> flights.groupby(['AIRLINE', 'MONTH', 'ORG_AIR', 'CANCELLED']) 
           ['DEP_DELAY', 'DIST'] 
           .agg(['mean', 'sum']) 
           .unstack(['ORG_AIR', 'CANCELLED'], fill_value=0) 
           .swaplevel(0, 1, axis='columns')

There are a few differences. The pivot_table method does not accept aggregation functions as strings when passed as a list like the agg groupby method. Instead, you must use NumPy functions. The order of the column levels also differs, with pivot_table putting the aggregation functions at a level preceding the columns in the values parameter. This is equalized with the swaplevel method that, in this instance, switches the order of the top two levels.

As of the time of writing this book, there is a bug when unstacking more than one column. The fill_value parameter is ignored (http://bit.ly/2jCPnWZ). To work around this bug, chain .fillna(0) to the end of the code.

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...