2.4 Kernel-Based Classification

Kernel-based approaches have recently received considerable interest in solving linear nonseparable problems in classification. Their main idea is to introduce a nonlinear kernel that maps the original data into a high-dimensional feature space from which the extracted features can be used to resolve the issue of linear nonseparability encountered in the original data space. The property of the Mercer's theorem in Scholkopf and Smola (2002) then plays a trick called kernel trick to implicitly calculate the dot products in the feature space F without actually mapping all data sample vectors into the feature space F in which case the kernel function was not even identified.

2.4.1 Kernel Trick Used in Kernel-Based Methods

Prior to introducing kernel-based classifiers a kernelization procedure and kernel trick need to be discussed. The idea of kernel-based techniques is to obtain a nonlinear version of a linear algorithm by implicitly mapping the original data to a higher-dimensional feature space. Suppose that img forms a data space of img and ϕ is a nonlinear function that maps the data space X into a feature space F with dimensionality yet to be determined by a kernel, that is,

(2.84) equation

It is shown in Scholkopf and Smola (2002) that an effective kernel-based learning algorithm known as “kernel trick” can be implemented. This technique implicitly computes dot products in F without mapping the original data sets into a feature space, which provides feasibility to implement classifiers in a higher-dimensional space. The kernel representation of dot products in F is expressed as

(2.85) equation

where K is a kernel function of the original data. There are three common types of kernel that are generally used:

1. Polynomial kernel, img,
2. Radial basis function kernel, img, and
3. Sigmoid function kernel, img.

2.4.2 Kernel-Based Fisher's Linear Discriminant Analysis (KFLDA)

The idea of kernel-based FLDA (KFLDA) is to first map the data matrix X via (2.84) using a nonlinear kernel ϕ into a higher-dimensional feature space F in which FLDA is performed where the between-scatter and within-class scatter matrices in (2.40) and (2.41) can be reexpressed by

(2.86) equation

(2.87) equation

where img and img. Using (2.86) and (2.87) the Fisher's ratio in the feature space F can be derived as follows.

(2.88) equation

where w is a nonzero vector in F. According to the theory of reproducing kernel, any data vector in the feature space F can be also represented as a linear combination of img by

(2.89) equation

where img and img. Substituting (2.89) in (2.86) and (2.87) yields

(2.90) equation

(2.91) equation

where img and img are obtained by the kernel trick. Using (2.90) and (2.91), (2.88) can be reexpressed as

(2.92) equation

The optimal solution to maximizing (2.92) is given by a set of p − 1 eigenvectors img that solve the following generalized eigenvalue problem

(2.93) equation

For a new data sample vector img in the feature space F its projection onto the optimal jth discriminant vector img is given by

(2.94) equation

where

(2.95) equation

Let img be the optimal feature matrix formed by img. The data space img in the feature space F can be obtained via the optimal discriminant feature matrix W specified by (2.95) as follows

(2.96) equation

where X is the original data space. KFLDA is then used to perform the FLDA on img and is expected to perform better than FLDA directly operating on the original space X.

2.4.3 Kernel Support Vector Machine (K-SVM)

The SVM discussed in Section 2.3.1.2 is a linear machine that performs a binary decision for linear separable or nonseparable problems. As we have seen, in order for a linear SVM to solve linear nonseparable problems, a set of slack variables are introduced to resolve the dilemma caused by confusing data sample vectors. In this section, we consider a rather different approach that generalizes the linear discriminant function img to allow a much larger class of possible decision boundaries by transforming an original data sample x via a set of M predetermined nonlinear functions ϕj(r), img referred to as basis functions or kernels. The output of r resulting from such a nonlinear transform can be represented by a linear combination of these functions img as follows

(2.97) equation

In order to absorb the bias bk in the linear sum in (2.97), we can introduce an auxiliary function ϕ0(r) = 1 so that the Equation (2.71) can be reexpressed as a linear form by

(2.98) equation

where img and the weight vector wk is represented by img with wk0 = bk. By adapting (2.98) we can also obtain a kernel-based SVM specified by the weight vector wSVM given by

(2.99) equation

(2.100) equation

If we further define the kernel K(r,ri) as the inner product of r and ri by

(2.101) equation

we can form a kernel-based SVM (KSVM) as follows.

Given a set of training pool, img, find a set of Lagrange multiplier vector img that maximizes Q(α), which is modified from (2.60) and given by

(2.102) equation

subject to the following constraints:

1. img
2. img for img, with C being a user-specified positive parameter.

An optimal solution to the above kernel-based SVM optimization problem is given by

(2.103) equation

where ns is the number of support vectors.

The key issue to solve the KSVM is to determine the kernel used in (2.103). Interestingly, according to Mercer's theorem, if a kernel-based inner product K(x,y) is continuous and symmetric in a closed interval img, K(x,y) can be expanded in a series

(2.104) equation

where img are eigenvalues and img are their associated eigenfunctions.

The KFLDA and KSVM discussed above are derived from two popular hard decision-made classifiers, FLDA and SVM. Correspondingly, there should also be kernel versions of soft decision-made classifiers as discussed in Section 2.3.2.1 such as OSP. However, their derivations are much more involved. Therefore, their kernel versions, referred to as kernel-based linear spectral mixture analysis (KLSMA), will be developed and discussed in a separate and individual chapter, Chapter 15, which includes kernel versions of various LSMA-based classifiers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.6.194