KRR combines linear ridge regression (linear regression and L2 norm regularization) with the kernel trick and can be used for classification problems. It learns a linear function in the higher-dimensional space produced by the chosen kernel and training data. For non-linear kernels, it learns a non-linear function in the original space.
The model learned by KRR is identical to the SVM model, but these approaches have the following differences:
- The KRR method uses squared error loss, while the SVM model uses insensitive loss or hinge loss for classification.
- In contrast to the SVM method, the KRR training can be completed in the closed-form so that it can be trained faster for medium-sized datasets.
- The learned KRR model is non-sparse and can be slower than the SVM model when it comes to prediction times.
Despite these differences, both approaches usually use L2 regularization.