Using scikit-learn for k-fold cross-validation

In scikit-learn, cross-validation can be performed in three steps:

  1. Load the dataset. Since we already did this earlier, we don't have to do it again.
  2. Instantiate the classifier:
      In [8]: from sklearn.neighbors import KNeighborsClassifier
... model = KNeighborsClassifier(n_neighbors=1)
  1. Perform cross-validation with the cross_val_score function. This function takes as input a model, the full dataset (X), the target labels (y), and an integer value for the number of folds (cv). It is not necessary to split the data by hand—the function will do that automatically depending on the number of folds. After the cross-validation is completed, the function returns the test scores:
      In [9]: from sklearn.model_selection import cross_val_score
... scores = cross_val_score(model, X, y, cv=5)
... scores
Out[9]: array([ 0.96666667, 0.96666667, 0.93333333, 0.93333333,
1. ])

To get a sense of how the model did on average, we can look at the mean and standard deviation of the five scores:

In [10]: scores.mean(), scores.std()
Out[10]: (0.95999999999999996, 0.024944382578492935)

With five folds, we have a much better idea about how robust the classifier is on average. We see that kNN with k=1 achieves on average 96% accuracy, and this value fluctuates from run to run with a standard deviation of roughly 2.5%.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.200.220