Grouping columns

We have already discussed several ways in which we can group columns and rows using the pandas dataframe in Chapter 6 , Grouping Dataset. In this section, we will use the same technique to group different columns together:

Let's use the combined dataframe and group them using the columns, alcohol, density, pH, and quality.
Next, we can apply the pd.describe() method to get the most frequently used descriptive statistics:

subset_attr = ['alcohol', 'density', 'pH', 'quality']

low = round(df_wines[df_wines['quality_label'] == 'low'][subset_attr].describe(), 2)
medium = round(df_wines[df_wines['quality_label'] == 'medium'][subset_attr].describe(), 2)
high = round(df_wines[df_wines['quality_label'] == 'high'][subset_attr].describe(), 2)

pd.concat([low, medium, high], axis=1, 
          keys=[' Low Quality Wine', 
                ' Medium Quality Wine', 
                ' High Quality Wine'])

In the preceding code snippet, first, we created a subset of attributes that we are interested in. Then, we created three different dataframes for low-quality wine, medium-quality wine, and high-quality wine. Finally, we concatenated them. The output of the preceding code is given here:

Figure 12.11 - Output of grouping the columns and performing the describe operation

As shown in the preceding screenshot, we have grouped the dataset into three distinct groups: low-quality wine, medium-quality wine, and high-quality wine. Each group shows three different attributes: alcohol, density, and pH value. Using the concatenation method to group the columns based on certain conditions can be very handy during the data analysis phase.

In the next section, we are going to discuss the univariate analysis for the wine quality dataset.

Table of Contents for Grouping columns

Create new playlist

Sign In

Sign Up

Table of Contents for
Grouping columns