Dealing with nonlinear decision boundaries

What if the data cannot be optimally partitioned using a linear decision boundary? In such a case, we say the data is not linearly separable.

The basic idea to deal with data that is not linearly separable is to create nonlinear combinations of the original features. This is the same as saying we want to project our data to a higher-dimensional space (for example, from 2D to 3D), in which the data suddenly becomes linearly separable.

This concept is illustrated in the following diagram:

The preceding diagram shows how to find linear hyperplanes in higher-dimensional spaces. If data in its original input space (left) cannot be linearly separated, we can apply a mapping function ϕ(.) that projects the data from 2D into a 3D (or a high-dimensional) space. In this higher-dimensional space, we may find that there is now a linear decision boundary (which, in 3D, is a plane) that can separate the data.

A linear decision boundary in an n-dimensional space is called a hyperplane. For example, a decision boundary in 6D feature space is a 5D hyperplane; in 3D feature space, it's a regular 2D plane; and in 2D space, it's a straight line.

However, one problem with this mapping approach is that it is impractical in large dimensions because it adds a lot of extra terms to do the mathematical projections between the dimensions. This is where the so-called kernel trick comes into play.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.141.115