Implementing leave-one-out cross-validation

Another popular way to implement cross-validation is to choose the number of folds equal to the number of data points in the dataset. In other words, if there are N data points, we set k=N. This means that we will end up having to do N iterations of cross-validation, but in every iteration, the training set will consist of only a single data point. The advantage of this procedure is that we get to use all-but-one data points for training. Hence, this procedure is also known as leave-one-out cross-validation.

In scikit-learn, this functionality is provided by the LeaveOneOut method from the model_selection module:

In [11]: from sklearn.model_selection import LeaveOneOut

This object can be passed directly to the cross_val_score function in the following way:

In [12]: scores = cross_val_score(model, X, y, cv=LeaveOneOut())

Because every test set now contains a single data point, we would expect the scorer to return 150 values—one for each data point in the dataset. Each of these points we get could be either right or wrong. Hence, we expect scores to be a list of ones (1) and zeros (0), which corresponds to correct and incorrect classifications, respectively:

In [13]: scores
Out[13]: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 1., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1.,
                 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
                 1., 1., 1., 1., 1., 1., 1.])

If we want to know the average performance of the classifier, we would still compute the mean and standard deviation of the scores:

In [14]: scores.mean(), scores.std()
Out[14]: (0.95999999999999996, 0.19595917942265423)

We can see this scoring scheme returns very similar results to five-fold cross-validation.

You can learn more about other useful cross-validation procedures at http://scikit-learn.org/stable/modules/cross_validation.html.

Table of Contents for Implementing leave-one-out cross-validation

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing leave-one-out cross-validation