Just knowing how supervised learning works is not going to be of any use if we can't put it into practice. Thankfully, OpenCV provides a pretty straightforward interface for all its statistical learning models, which includes all supervised learning models.
In OpenCV, every machine learning model derives from the cv::ml::StatModel base class. This is fancy talk for saying that if we want to use a machine learning model in OpenCV, we have to provide all of the functionality that StatModel tells us to. This includes a method to train the model (called train) and a method to measure the performance of the model (called calcError).
Thanks to this organization of the software, setting up a machine learning model in OpenCV always follows the same logic, as we will see later:
- Initialization: We call the model by name to create an empty instance of the model.
- Set parameters: If the model needs some parameters, we can set them via setter methods, which can be different for every model. For example, for a k-NN algorithm to work, we need to specify its open parameter, k (as we will find out later).
- Train the model: Every model must provide a method called train, used to fit the model to some data.
- Predict new labels: Every model must provide a method called predict, used to predict the labels of new data.
- Score the model: Every model must provide a method called calcError, used to measure performance. This calculation might be different for every model.
As we will make occasional use of scikit-learn to implement some machine learning algorithms that OpenCV does not provide, it is worth pointing out that learning algorithms in scikit-learn follow an almost identical logic. The most notable difference is that scikit-learn sets all of the required model parameters in the initialization step. Also, it calls the training function, fit, instead of train and the scoring function score instead of calcError.