Summary

In this chapter, we presented the maximal margin hyperplane as a decision boundary that is designed to separate two classes by finding the maximum distance from either of them. When the two classes are linearly separable, this creates a situation where the space between the two classes is evenly split.

We've seen that there are circumstances where this is not always desirable, such as when the classes are close to each other because of a few observations. An improvement to this approach is the support vector classifier that allows us to tolerate a few margin violations, or even misclassifications, in order to obtain a more stable result. This also allows us to handle classes that aren't linearly separable. The form of the support vector classifier can be written in terms of inner products between the observation that is being classified and the support vectors. This transforms our feature space from p features into as many features as we have support vectors. Using kernel functions on these new features, we can introduce nonlinearity in our model and thus obtain a support vector machine.

In practice, we saw that training a support vector classifier, which is a support vector machine with a linear kernel, involves adjusting the cost parameter. The performance we obtain on our training data can be close to what we get in our test data. By contrast, we saw that by using a radial kernel, we have the potential to fit our training data much more closely, but we are far more likely to fall into the trap of overfitting.

To deal with this, it is useful to try different combinations of the cost and the gamma parameters. To do this efficiently and without requiring us to sacrifice data for use as a validation set, we introduced the idea of cross-validation. Effectively, we try out many different splits of the original training data into training and test sets and average the test set accuracy across all the splits. We then use this accuracy to decide on values for the parameters that interest us, and use our original test data set to estimate the performance on unseen data using a model that has been trained with our chosen parameter values.

In the next chapter, we are going to explore another cornerstone of machine learning: tree-based models. Also known as decision trees, they can handle regression and classification problems with many classes, are highly interpretable, and have a built-in way of handling missing data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.202.61