The easiest way to perform cross-validation in OpenCV is to do the data splits by hand.
For example, to implement two-fold cross-validation, we would perform the following procedure:
- Load the dataset:
In [1]: from sklearn.datasets import load_iris
... import numpy as np
... iris = load_iris()
... X = iris.data.astype(np.float32)
... y = iris.target
- Split the data into two equally sized parts:
In [2]: from sklearn.model_selection import model_selection
... X_fold1, X_fold2, y_fold1, y_fold2 = train_test_split(
... X, y, random_state=37, train_size=0.5
... )
- Instantiate the classifier:
In [3]: import cv2
... knn = cv2.ml.KNearest_create()
... knn.setDefaultK(1)
- Train the classifier on the first fold, then predict the labels of the second fold:
In [4]: knn.train(X_fold1, cv2.ml.ROW_SAMPLE, y_fold1)
... _, y_hat_fold2 = knn.predict(X_fold2)
- Train the classifier on the second fold, then predict the labels of the first fold:
In [5]: knn.train(X_fold2, cv2.ml.ROW_SAMPLE, y_fold2)
... _, y_hat_fold1 = knn.predict(X_fold1)
- Compute accuracy scores for both folds:
In [6]: from sklearn.metrics import accuracy_score
... accuracy_score(y_fold1, y_hat_fold1)
Out[6]: 0.92000000000000004
In [7]: accuracy_score(y_fold2, y_hat_fold2)
Out[7]: 0.88
This procedure will yield two accuracy scores, one for the first fold (92% accuracy) and one for the second fold (88% accuracy). On average, our classifier hence achieved 90% accuracy on unseen data.