Converting into a categorical column

Although the quality column is numerical, here, we are interested in taking quality as the class. To make it clear, let's convert numerical values into categorical values in this subsection. 

To do so, we need a set of rules. Let's define a set of rules:

That sounds doable, right? Of course, it is. Let's check the code, given as follows:

df_red['quality_label'] = df_red['quality'].apply(lambda value: ('low' if value <= 5 else 'medium') if value <= 7 else 'high')
df_red['quality_label'] = pd.Categorical(df_red['quality_label'], categories=['low', 'medium', 'high'])

df_white['quality_label'] = df_white['quality'].apply(lambda value: ('low' if value <= 5 else 'medium') if value <= 7 else 'high')
df_white['quality_label'] = pd.Categorical(df_white['quality_label'], categories=['low', 'medium', 'high'])

The preceding code should be self-explanatory by now. We just used the pandas.apply() method to check the value in the quality columns. Based on their values, if they are less than or equal to five, we categorized them as low-quality wine. Similarly, if the value of the quality column is greater than 5 and less than or equal to 7, we classified them as medium-quality wine. Finally, any rows with a quality column containing a value greater than 7 were classified as high-quality wine. 

Let's count the number of values in each category of wine:

print(df_white['quality_label'].value_counts())
df_red['quality_label'].value_counts()

And the output of the preceding code is given as follows:

medium 3078
low 1640
high 180
Name: quality_label, dtype: int64

medium 837
low 744
high 18
Name: quality_label, dtype: int64

The top one is for white wine and the lower one is red wine. It is pretty obvious from the preceding output that most of the wines are of medium quality in both cases. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.4.174