Model selection is an essential step in the machine learning algorithm workflow. However, model selection carries different meanings in different contexts:
- Context 1: In the machine learning workflow context, model selection is the process of selecting the best machine learning algorithms, such as logistic regression, SVM, decision tree, Random Forest classifier, and so on.
- Context 2: Similarly, the model selection phase also refers to the process of choosing between different hyperparameters for any selected machine learning algorithm.
In general, model selection is the method of choosing one best machine learning algorithm from a list of possible candidate algorithms for a given training dataset. There are different model selection techniques. In a normal scenario, we split the training corpus into a training set, a validation set, and a testing set. Then, we fit several candidate models on the training set, evaluate the models using the validation set, and report the performance of the model on the testing set. However, this scenario of model selection only works when we have a sufficiently large training corpus.
However, in many cases, the amount of data for training and testing is limited. In such a case, the model selection becomes difficult. In such a case, we can use two different techniques: probabilistic measure and resampling method. We suggest that you go through the Further reading section of this chapter if you wish to understand these model selection techniques.