Working with groupby() and aggregation, you must have thought, why can't we group data, apply aggregation, and append the result into the dataframe directly? Is it possible to do all this in a single step? Yes, it is.
Performing a transformation on a group or a column returns an object that is indexed by the same axis length as itself. It is an operation that's used in conjunction with groupby(). The aggregation operation has to return a reduced version of the data, whereas the transformation operation can return a transformed version of the full data. Let's take a look:
- Let's begin by using a simple transformation function to increase the price of each car by 10% using the lambda function:
df["price"]=df["price"].transform(lambda x:x + x/10)
df.loc[:,'price']
The output of the preceding code is as follows:
0 14844.5
1 18150.0
2 18150.0
3 15345.0
4 19195.0
...
196 18529.5
197 20949.5
198 23633.5
199 24717.0
200 24887.5
Name: price, Length: 201, dtype: float64
- Let's observe the average price of cars for each grouping by body-style and drive-wheels:
df.groupby(["body-style","drive-wheels"])["price"].transform('mean')
The output of the preceding code is as follows:
0 26344.560000
1 26344.560000
2 15771.555556
3 10792.980000
4 13912.066667
...
196 23883.016667
197 23883.016667
198 23883.016667
199 23883.016667
200 23883.016667
Name: price, Length: 201, dtype: float64
If you look at the preceding output, you will notice how this returns a different sized dataset from our normal groupby() functions.
- Now, create a new column for an average price in the original dataframe:
df["average-price"]=df.groupby(["body-style","drive-wheels"])["price"].transform('mean')
# selecting columns body-style,drive-wheels,price and average-price
df.loc[:,["body-style","drive-wheels","price","average-price"]]
The output of the preceding code is as follows:
The output shown in the preceding screenshot is pretty obvious. We computed the price and the average price for two categories: body-style and drive-wheels. Next, we are going to discuss how to use pivot tables and cross-tabulation techniques.