Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Support vector machine

This algorithm, Support Vector Machine (SVM), tries to geometrically separate the dataset into two subsets labeled with y_i=+1 and y_i=-1. The next figure shows the data perfectly separated into two classes (empty circles and black circles), that is, the case the data in which the decision boundary (or hyperplane) given by the black line fully separates the two classes (in other words, there are no misclassified data points):

Sketch of the dataset separated into two classes (empty and filled circles) by the black line (decision boundary)

The hyperplane is mathematically described by the equation , where Support vector machine is the distance of the hyperplane from the origin and w is the normal to the hyperplane. The goal of the algorithm is to maximize the distance of the decision boundary from the data points. In practice, we consider the closest points i to the hyperplane, called support vectors, that lie in two planes H₁, H₂ at distances d₁, d₂ from the decision boundary such that:

for H₁ such that y_i=+1 ––––––––(1)

for H₂ such that y_i=-1––––––––(2)

Assuming d₁=d₂, the common distance is called margin so that the support vector machine method finds the values of w and b that maximize the margin.

Since the distance between H₁ and H₂ is given by , the margin is equal to and the support vector machine algorithm is equivalent to:

Support vector machine such that ,

Here, the square operation and the factor have been added to allow the use of a quadratic programming method to solve the mathematical problem. Now, the problem can be rewritten in a Lagrangian form using the Lagrange multipliers a_i >0:

Setting the derivatives with respect to and b to 0, we obtain:

Support vector machine ––––––––(3)

Support vector machine ––––––––(4)

So the optimized Lagrangian becomes:

Here, .

This is known as a dual form of the original problem, which depends only on the maximization of ai:

The solutions (the cases a_i =0 return null vectors) are found using a technique called quadratic programming and represent the support vectors w through formula (3):

––––––––(5).

as satisfy the equation (combination of equation (1) and (2)):

Substituting equation (3) and multiplying both sides by y_s (which is +1 or -1), we obtain:

Averaging over all the support vectors N_s we can have a better estimate of the parameter b:

Support vector machine ––––––––(6)

The equations (5) and (6) return the values of the parameters that define the support vector machines algorithm, from which it is possible to predict the class of all test points t:

If a line is not able to completely separate the data points into two classes, we need to allow the data points to be misclassified by an error such that:

And we need to maximize the margin, trying to minimize the misclassification errors. This condition is translated into this equation:

Support vector machine such that

Here, the parameter C is set to balance the size of the margin with the misclassification errors (C=0 trivially no misclassification and maximum margin, C>>1 many misclassified points and a narrow margin). Applying the same method as before, the dual problem is subjected to Lagrange multipliers' conditions with an upper bound C:

Until now, we have treated problems in which only two classes are considered. Real problems may have multiple classes, and two procedures are commonly used to employ this method (as seen for logistic regression): one versus all or one versus one. Given a problem with M classes, the first method trains M SVM models, each one assuming the labels of the considered class j +1 and all the rest -1. The second method instead trains a model for each pair of classes i, j, leading to Support vector machine trained models. Clearly, the second method is computationally more expensive but the results are generally more precise.

In a similar way, SVM can be used in regression problems, that is, whenever y_i is continuous between -1 and 1. In this case, the goal is to find the parameters w and b such that:

We assume that the true values t_i can differ from the predicted value y_i of a maximum and the predictions can further be misclassified of about depending on whether y_i is larger or smaller than t_i. The following figure shows for an example point i the various predictions y_i lying around the true value t_i, and the associated errors:

The predictions y_i lie around the true value _ti

The minimization problem becomes:

Such that:

It is possible to show that the associated dual problem is now equal to:

Support vector machine subject to .

Here, are the Lagrangian multipliers.

The new prediction, y_p, can be found by applying the formula Support vector machine , where the parameter b can be obtained as before—averaging on the subset S given by the support vectors associated with the subset and :

Kernel trick

There are datasets that are not linearly separable in a certain space, but if it is transformed in the right space, then a hyperplane can separate the data into the desired two or more classes. Consider the example shown in the following figure:

In a two-dimensional space, the dataset shown on the left is not separable. Mapping the dataset in a three-dimensional space, the two classes are separable.

We can clearly see that the two classes are not linearly separable in two-dimensional space (the left figure). Suppose we then apply a kernel function K on the data such that:

The data is now separable by a two-dimensional plane (the right figure). The kernel function on the SVM algorithm is applied to the matrix H_ij, replacing the dot product on the variable i, j:

Popular kernel functions used on the SVM algorithm are:

Linear kernel:
Radial basis kernel (RBF):
Polynomial kernel:
Sigmoid kernel:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Support vector machine

Create new playlist

Sign In

Sign Up

Support vector machine

Kernel trick

Table of Contents for
Support vector machine