A pivot table, as known from Excel, summarizes data. The data in CSV files that we have seen in this chapter so far has been in flat files. The pivot table aggregates data from a flat file for certain columns and rows. The aggregating operation can be sum, mean, standard deviations, and so on. We will reuse the data generating code from data_aggregation.py
. The pandas API has a top-level pivot_table()
function and corresponding DataFrame method. With the aggfunc
parameter, we can specify the aggregation function to use the NumPy sum()
function, for instance. The cols
parameter tells pandas the column to be aggregated. Create a pivot table on the Food
column as follows:
print pd.pivot_table(df, cols=['Food'], aggfunc=np.sum)
The pivot table we get contains totals for each food item:
Food chocolate icecream soup Number 8.000000 15.000000 19.00000 Price 5.986585 10.440071 13.83338 [2 rows x 3 columns]
The following code can be found in pivot_demo.py
in this book's code bundle:
import pandas as pd from numpy.random import seed from numpy.random import rand from numpy.random import random_integers import numpy as np seed(42) N = 7 df = pd.DataFrame({ 'Weather' : ['cold', 'hot', 'cold', 'hot', 'cold', 'hot', 'cold'], 'Food' : ['soup', 'soup', 'icecream', 'chocolate', 'icecream', 'icecream', 'soup'], 'Price' : 10 * rand(N), 'Number' : random_integers(1, 9, size=(N,))}) print "DataFrame ", df print pd.pivot_table(df, cols=['Food'], aggfunc=np.sum)
3.15.137.59