Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2.4 Kernel-Based Classification

Kernel-based approaches have recently received considerable interest in solving linear nonseparable problems in classification. Their main idea is to introduce a nonlinear kernel that maps the original data into a high-dimensional feature space from which the extracted features can be used to resolve the issue of linear nonseparability encountered in the original data space. The property of the Mercer's theorem in Scholkopf and Smola (2002) then plays a trick called kernel trick to implicitly calculate the dot products in the feature space F without actually mapping all data sample vectors into the feature space F in which case the kernel function was not even identified.

2.4.1 Kernel Trick Used in Kernel-Based Methods

Prior to introducing kernel-based classifiers a kernelization procedure and kernel trick need to be discussed. The idea of kernel-based techniques is to obtain a nonlinear version of a linear algorithm by implicitly mapping the original data to a higher-dimensional feature space. Suppose that forms a data space of and ϕ is a nonlinear function that maps the data space X into a feature space F with dimensionality yet to be determined by a kernel, that is,

(2.84)

It is shown in Scholkopf and Smola (2002) that an effective kernel-based learning algorithm known as “kernel trick” can be implemented. This technique implicitly computes dot products in F without mapping the original data sets into a feature space, which provides feasibility to implement classifiers in a higher-dimensional space. The kernel representation of dot products in F is expressed as

(2.85)

where K is a kernel function of the original data. There are three common types of kernel that are generally used:

1. Polynomial kernel,

2. Radial basis function kernel,

, and

3. Sigmoid function kernel,

2.4.2 Kernel-Based Fisher's Linear Discriminant Analysis (KFLDA)

The idea of kernel-based FLDA (KFLDA) is to first map the data matrix X via (2.84) using a nonlinear kernel ϕ into a higher-dimensional feature space F in which FLDA is performed where the between-scatter and within-class scatter matrices in (2.40) and (2.41) can be reexpressed by

(2.86)

(2.87)

where and . Using (2.86) and (2.87) the Fisher's ratio in the feature space F can be derived as follows.

(2.88)

where w is a nonzero vector in F. According to the theory of reproducing kernel, any data vector in the feature space F can be also represented as a linear combination of by

(2.89)

where and . Substituting (2.89) in (2.86) and (2.87) yields

(2.90)

(2.91)

where and are obtained by the kernel trick. Using (2.90) and (2.91), (2.88) can be reexpressed as

(2.92)

The optimal solution to maximizing (2.92) is given by a set of p − 1 eigenvectors that solve the following generalized eigenvalue problem

(2.93)

For a new data sample vector in the feature space F its projection onto the optimal jth discriminant vector is given by

(2.94)

where

(2.95)

Let be the optimal feature matrix formed by . The data space in the feature space F can be obtained via the optimal discriminant feature matrix W^∗ specified by (2.95) as follows

(2.96)

where X is the original data space. KFLDA is then used to perform the FLDA on and is expected to perform better than FLDA directly operating on the original space X.

2.4.3 Kernel Support Vector Machine (K-SVM)

The SVM discussed in Section 2.3.1.2 is a linear machine that performs a binary decision for linear separable or nonseparable problems. As we have seen, in order for a linear SVM to solve linear nonseparable problems, a set of slack variables are introduced to resolve the dilemma caused by confusing data sample vectors. In this section, we consider a rather different approach that generalizes the linear discriminant function to allow a much larger class of possible decision boundaries by transforming an original data sample x via a set of M predetermined nonlinear functions ϕ_j(r), referred to as basis functions or kernels. The output of r resulting from such a nonlinear transform can be represented by a linear combination of these functions as follows

(2.97)

In order to absorb the bias b_k in the linear sum in (2.97), we can introduce an auxiliary function ϕ₀(r) = 1 so that the Equation (2.71) can be reexpressed as a linear form by

(2.98)

where and the weight vector w_k is represented by with w_k0 = b_k. By adapting (2.98) we can also obtain a kernel-based SVM specified by the weight vector w^SVM given by

(2.99)

(2.100)

If we further define the kernel K(r,r_i) as the inner product of r and r_i by

(2.101)

we can form a kernel-based SVM (KSVM) as follows.

Given a set of training pool, , find a set of Lagrange multiplier vector that maximizes Q(α), which is modified from (2.60) and given by

(2.102)

subject to the following constraints:

for

, with C being a user-specified positive parameter.

An optimal solution to the above kernel-based SVM optimization problem is given by

(2.103)

where n_s is the number of support vectors.

The key issue to solve the KSVM is to determine the kernel used in (2.103). Interestingly, according to Mercer's theorem, if a kernel-based inner product K(x,y) is continuous and symmetric in a closed interval , K(x,y) can be expanded in a series

(2.104)

where are eigenvalues and are their associated eigenfunctions.

The KFLDA and KSVM discussed above are derived from two popular hard decision-made classifiers, FLDA and SVM. Correspondingly, there should also be kernel versions of soft decision-made classifiers as discussed in Section 2.3.2.1 such as OSP. However, their derivations are much more involved. Therefore, their kernel versions, referred to as kernel-based linear spectral mixture analysis (KLSMA), will be developed and discussed in a separate and individual chapter, Chapter 15, which includes kernel versions of various LSMA-based classifiers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2.4 Kernel-Based Classification

Create new playlist

Sign In

Sign Up

2.4 Kernel-Based Classification

2.4.1 Kernel Trick Used in Kernel-Based Methods

2.4.2 Kernel-Based Fisher's Linear Discriminant Analysis (KFLDA)

2.4.3 Kernel Support Vector Machine (K-SVM)

Table of Contents for
2.4 Kernel-Based Classification