In this chapter, we have explored techniques that can be used to diagnose and improve a given machine learning model. The following are some of the other points that we have covered:
We have revisited the problems of underfitting and overfitting of sample data and also discussed how we can evaluate a formulated model to diagnose whether it's underfit or overfit.
We have explored cross-validation and how it can be used to determine how well a formulated model will respond to previously unseen data. We have also seen that we can use cross-validation to select the features and the regularization parameter of a model. We also studied a few kinds of cross-validation that we can implement for a given model.
We briefly explored learning curves and how they can be used to diagnose the underfit and overfit models.
We've explored the tools provided by the clj-ml library to cross-validate a given classifier.
Lastly, we've built an operational spam classifier that incorporates cross-validation to determine whether the classifier is appropriately classifying e-mails as spam.
In the following chapters, we will continue exploring more machine learning models, and we'll also study Support Vector Machines (SVMs) in detail.