Groupby mechanics

While working with the pandas dataframes, our analysis may require us to split our data by certain criteria. Groupby mechanics amass our dataset into various classes in which we can perform exercises and make changes, such as the following:

  • Grouping by features, hierarchically
  • Aggregating a dataset by groups
  • Applying custom aggregation functions to groups
  • Transforming a dataset groupwise

The pandas groupby method performs two essential functions:

  • It splits the data into groups based on some criteria.
  • It applies a function to each group independently.

To work with groupby functionalities, we need a dataset that has multiple numerical as well as categorical records in it so that we can group by different categories and ranges.

Let's take a look at a dataset of automobiles that enlists the different features and attributes of cars, such as symbolling, normalized-losses, make, aspiration, body-style, drive-wheels, engine-location, and many others. Let's get started:

  1. Let's start by importing the required Python libraries and datasets:
import pandas as pd
df = pd.read_csv("/content/automobileEDA.csv")
df.head()

Here, we're assuming that you have the database stored in your current drive. If you don't, you can change the path to the correct location. By now, you should be familiar with the appropriate data loading techniques for doing this, so we won't cover this again here. 

The output of the preceding code is as follows:

As you can see, there are multiple columns with categorical variables.

  1. Using the groupby() function lets us group this dataset on the basis of the body-style column:
df.groupby('body-style').groups.keys()

The output of the preceding code is as follows:

dict_keys(['convertible', 'hardtop', 'hatchback', 'sedan', 'wagon'])

From the preceding output, we know that the body-style column has five unique values, including convertible, hardtop, hatchback, sedan, and wagon

  1. Now, we can group the data based on the body-style column. Next, let's print the values contained in that group that have the body-style value of convertible. This can be done using the following code:
# Group the dataset by the column body-style
style = df.groupby('body-style')

# Get values items from group with value convertible
style.get_group("convertible")

The output of the preceding code is as follows:

In the preceding example, we have grouped by using a single body-style column. We can also select a subset of columns. We'll learn how to do this in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.235.176