Loading the dataset

As mentioned in the Technical requirements section, the dataset can be download from the UCI website directly. Now, let's use the pandas pd.read_csv() method to load the dataset into the Python environment. By now, this operation should be relatively easy and intuitive:

  1. We start by loading the pandas library and create two different dataframes, namely, df_red for holding the red wine dataset and df_white for holding the white wine dataset:
import pandas as pd

df_red = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", delimiter=";")
df_white = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", delimiter=";")
  1. We have two dataframes created. Let's check the name of the available columns:
df_red.columns

Furthermore, the output of the preceding code is given here:

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
'pH', 'sulphates', 'alcohol', 'quality'],
dtype='object')

As shown in this output, the dataset contains the following columns:

  • Fixed acidity: It indicates the amount of tartaric acid in wine and is measured in g/dm3.
  • Volatile acidity: It indicates the amount of acetic acid in the wine. It is measured in g/dm3.
  • Citric acid: It indicates the amount of citric acid in the wine. It is also measured in g/dm3.
  • Residual sugar: It indicates the amount of sugar left in the wine after the fermentation process is done. It is also measured in g/dm3.
  • Free sulfur dioxide: It measures the amount of sulfur dioxide (SO2) in free form. It is also measured in g/dm3 
  • Total sulfur dioxide: It measures the total amount of SO2 in the wine. This chemical works as an antioxidant and antimicrobial agent. 
  • Density: It indicates the density of the wine and is measured in g/dm3.
  • pH: It indicates the pH value of the wine. The range of value is between 0 to 14.0, which indicates very high acidity, and 14 indicates basic acidity. 
  • Sulphates: It indicates the amount of potassium sulphate in the wine. It is also measured in g/dm3.
  • Alcohol: It indicates the alcohol content in the wine. 
  • Quality: It indicates the quality of the wine, which is ranged from 1 to 10. Here, the higher the value is, the better the wine. 

Having discussed different columns in the dataset, let's now see some basic statistics of the data in the next section. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.236.70