Tuning hyperparameters using grid searches and cross-validation

Cross-validation, together with grid search, is commonly used to tune the hyperparameters of the model in order to achieve better performance. Below, we outline the differences between hyperparameters and parameters.

Hyperparameters:

  • External characteristic of the model
  • Not estimated based on data
  • Can be considered the model's settings
  • Set before the training phase
  • Tuning them can result in better performance

Parameters:

  • Internal characteristic of the model
  • Estimated based on data, for example, the coefficients of linear regression
  • Learned during the training phase

One of the challenges of machine learning is training models that are able to generalize well to unseen data (overfitting versus underfitting; a bias-variance trade-off). While tuning the model's hyperparameters, we would like to evaluate its performance on data that was not used for training. In the Splitting data into training and test sets recipe, we mentioned that we can create an additional validation set. The validation set is used explicitly to tune the model's hyperparameters, before the ultimate evaluation using the test set. However, creating the validation set comes at a price: data used for training (and possibly testing) is sacrificed, which can be especially harmful when dealing with small datasets. That is the reason why a technique called cross-validation became so popular. Cross-validation allows us to obtain reliable estimates of the model’s generalization error. It is easiest to understand it with an example. When doing k-fold cross-validation, we randomly split the training data into k folds. Then, we train the model using k-1 folds and evaluate the performance on the kth fold. We repeat this process k times and average the resulting scores. A potential drawback of cross-validation is the computational cost, especially when paired together with a grid search for hyperparameter tuning.

We already mentioned grid search as a technique used for tuning hyperparameters. The idea is to create a grid of all possible hyperparameter combinations and train the model using each one of them. Due to its exhaustive search, grid search guarantees to find the optimal parameter within the grid. The drawback is that the size of the grid grows exponentially with adding more parameters or more considered values. The number of required model fits and predictions increases significantly if we also use cross-validation!

As a solution to the problems encountered with grid search, we can also use the random search (also called randomized grid search). In this approach, we choose a random set of hyperparameters, train the model (also using cross-validation), return the scores, and repeat the entire process until we reach a predefined number of iterations or the computational time limit. Random search is preferred over grid search when dealing with a very large grid. That is because the former can explore a wider hyperparameter space and often find a hyperparameter set that performs very similarly to the optimal one (obtained from a full grid search) in a much shorter time. The only problematic question is: how many iterations are sufficient to find a good solution? However, there is no simple answer to that; most of the time, it is indicated by the available resources.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.34.36