8.2. Facial Feature Extraction Techniques

Ideal facial feature extraction techniques are able to (1) eliminate the influence of all environmental, non-identity-related variations and (2) maximize the difference between faces that belong to distinctive people. Numerous methods have been proposed that would attempt to extract such features to facilitate the recognition process. These methods can be classified into two categories: feature-invariant approaches and template-based approaches [38].

8.2.1. Feature-Invariant Approaches

This type of algorithm looks for structural features that exist even when the pose, viewpoint, or lighting conditions vary. Some prior knowledge may be used to develop rules to capture the relationship between facial features. Kelly [178] pioneered the work by using various features including width of the head; distance between the eyes, top of the head to eyes, eyes to the nose; and distance from eyes to the mouth. One of the recent image-invariant approaches is the local ordinal structure of brightness distribution between different parts of a human face [340]. The scheme checks the invariants for positive occurrences at all image locations. Yow and Cipolla [396] used a multistage feature extraction method for face detection. The first stage applies a second derivative Gaussian filter to the raw image. The filter kernel is of elliptical shape with a three-to-one aspect ratio designed for responding to many oval-shaped facial features, such as eyes and mouth. The second stage examines the edges around the interesting points that the Gaussian filter detects and groups those edges into regions. Measurements of a region's characteristics, such as edge length, edge strength, and intensity variance, are computed and stored as a "feature vector" for that facial feature.

One advantage of feature-invariant approaches is that they tend to have better tolerance toward variations of pose and facial expression. Because individual facial features are treated and examined locally, this category of feature extractor often accompanies a wire-frame type of pattern classifier such as active contour or dynamic link graph algorithms. For example, Manjunath et al. [230] used elastic graphs to link local facial features extracted by a Gabor filter set. Because wireframe recognizers are suitable for incorporating the three-dimensional structural information of human heads, they have a better chance of recognizing faces using a nonfrontal view.

On the other hand, the major drawback of feature-based algorithms is their expensive computational requirements. For instance, Wiskott and Von der Malsburg's dynamic link system [384] takes 10 to 15 minutes of a SPARC 10's CPU time to recognize one face from a gallery of 111 models. In addition, the focus of feature-invariant approaches on facial details often requires higher resolution images than template-based approaches.

8.2.2. Template-Based Approaches

This type of algorithm designs one or several standard face "templates" (usually frontal) either manually or by learning from examples in the image database. The face templates are used to scan through the incoming test images. If the template matcher returns a high correlation value when it scans a certain region in the test image, the algorithm determines that this region contains a human face. Compared to feature-invariant methods, template-matching approaches have the advantage of being simple to implement. However, studies have shown that template matching is inadequate when dealing with variation. Supplementary techniques, such as multi-resolution, multiscale subtemplates, and deformable templates, have subsequently been proposed to achieve scale, pose, and shape invariance. To obtain other invariance (e.g., lighting conditions, facial expression), template matching requires either more sophisticated statistical assistance or a powerful pattern classifier.

One important issue for statistical template matching is the curse of dimensionality [85]. As the dimension of the extracted feature vector increases, the number of training samples required for correct classification grows exponentially. A straightforward facial image input feature of 120x120 pixels results in a feature dimension of 14,000. An efficient preprocessing mechanism is therefore necessary to select a set of limited yet salient features. With the help of efficient feature extraction, the curse of dimensionality will be alleviated, and the accompanying classifier will run faster and consume less memory.

Principal Component Analysis

The most practical, systematic method of discovering low-dimensional representations of signals by relying on their statistical regularities is called principal component analysis (PCA). PCA assumes that the probability density of the input ensemble in the space of feature vectors is significantly non-zero only in a low-dimension linear subspace, which is parameterized with a linear expansion in the eigenvectors of the correlation matrix of the ensemble. PCA's power stems from its ease of computability and general applicability; so far it has been used in many real-world problems, including face recognition. Pentland et al. [272] used PCA to produce a representation of two-dimensional faces called eigenfaces. Let a face image I(x) be a two-dimensional mxm array of intensity values. I(x) can also be viewed as a vector of dimension m2 and x is the two-dimensional grid coordinate. Denote a training set (ensemble) of N face images by {Ii(x), i = 1, 2,..., N}. PCA extracts a hierarchical orthonormal basis of the linear subspace that the input ensemble spans. This is done by diagonalizing the correlation matrix of the ensemble

Equation 8.2.1


to produce the orthonormal set of eigenfaces Ψr(x), r = 1, 2,..., N and their respective eigenvalues λr, ordered in the natural hierarchy of decreasing magnitude.

The PCA representation

Equation 8.2.2


with Ar =ƒ Ψr (x) (I(x))T dx is decorrelated in the sense that

Equation 8.2.3


It has the property of best reconstruction—truncation in the expansion with K < N has a minimum mean-squared error where

Equation 8.2.4


With proper alignment of all of the facial images in the ensemble, most of the energy is compacted in the first K eigenfactors, where K is much smaller than mxm. For example, Pentland [272] compressed the original facial image with 128 × 128 = 16,384 pixels onto the subspace spanned by the first 40 significant eigenfaces and reached a 95% recognition rate among the 200 faces chosen from a group of 7,562 facial images (3,000 individuals). This 16,384-to-40 feature dimension reduction greatly alleviated the burden of its corresponding pattern classifier.

Fisher's Linear Discriminant

The error that PCA minimizes during dimension reduction is the "reconstruction" error of the whole ensemble, not the recognition error between the object classes within the ensemble. Fisher's linear discriminant (FLD) [20], on the other hand, tries to address the classification issue. With C different classes in an ensemble, FLD finds the optimal C − 1 dimensional subspace for classification. The objective of FLD is to maximize the ratio of the "between-class" scatter matrix and the "within-class" scatter matrix [85]. Let the between-class scatter matrix be defined as

Equation 8.2.5


and the within-class scatter matrix be defined as

Equation 8.2.6


where is the mean image of the ensemble, is the mean image of the j-th class, nj is the number of samples in the j-th class, is the number of images in the ensemble, and C is the number of classes in the ensemble. The optimal subspace, Eoptimal, is determined as follows:

Equation 8.2.7


where [e1, e2,..., eC − 1] is the set of generalized eigenvectors of sB and SW corresponding to the C − 1 largest generalized eigenvalues λi, i = 1, 2,..., C1; that is,

Equation 8.2.8


Thus, the feature vectors P for any query face images I in the most discriminant sense can be calculated as:

Equation 8.2.9


PCA and FLD can be combined in cascade. PCA could be used to reduce the feature dimension from m2 to K and then FLD could be applied on the K-dimensional PCA subspace to further reduce the dimension to C − 1. The following two conditions need to be satisfied when combining PCA and FLD:

  • From Eq. 8.2.6, the rank of SW is smaller than or equal to the minimum of K and C(nj − 1). To prevent SW from becoming singular, the value of K should be no more than N − C.

  • From Eq. 8.2.5, the rank of SB is smaller than or equal to the minimum of K and C − 1. Accordingly, there are at most C − 1 non-zero generalized eigenvectors. In other words, the FLD transforms the K-dimensional space into (C − 1)-dimensional space to classify C classes of faces. If K is smaller than C − 1, FLD does not reduce feature dimension.

Note that FLD is a linear transformation maximizing the ratio of the determinant of the projected samples' between-class scatter matrix to the determinant of the projected samples' within-class scatter matrix. The results are globally optimal only for linearly separable data. The linear subspace assumption is violated for face data with a great deal of overlap. Moreover, the separability criterion is not directly related to classification accuracy in the output space. Several researchers have indicated that the FLD method achieved the best performance on the training data but generalized poorly to new individuals [83].

Local Feature Analysis

Local feature analysis (LFA) [271] is the core technique of Facelt,[2] one of the most popular commercial face recognition systems. Unlike PCA, which simply expands the two-dimensional image into a one-dimensional vector, LFA tries to take advantage of the facial image's "topographical" property. LFA's first step is to design a preprocessing filter to enhance any local facial features that may contribute the greatest variations among different facial images. The idea of LFA is to use the values of several "significant" locations of the filtered image to constitute a much lower-dimension (V << m2) feature vector for classification. LFA can be viewed as a feature-based template matching approach; it uses a template, on which several focus points represent the features.

[2] http://www.identix.com/newsroom/whatisfaceit.html.

Define the output image after the LFA filtering to be

Equation 8.2.10


LFA defines the filter kernel K(x, y) to be a topographical one that projects input signals to the subspace spanned by the eigenvectors

Equation 8.2.11


where ψr(x) is the eigenface in PCA (Eq. 8.2.1), and Qrs is a priori an arbitrary matrix. Minimizing the correlation of the output image by minimizing

Equation 8.2.12


with respect to the matrix Qrs results in Qrs being given by

Equation 8.2.13


where Urs is any orthogonal matrix satisfying UTU = I and λr is the eigenvalue corresponding to ψr(x). The simplest choice for U is Urs = δrs. With this choice, the LFA outputs become

Equation 8.2.14


where Ar is defined in Eqs. 8.2.2 and 8.2.3, and their correlation can be easily computed using the orthonormality of the vectors ψr (x):

Equation 8.2.15


The function P(x, y) is an interesting object; it can easily be recognized as the projection operator onto the subspace spanned by the PCA eigenvectors.

To summarize, given the eigenvectors ψr(x) with eigenvalues λr, the following two functions can be constructed:

Equation 8.2.16


Equation 8.2.17


K(x, y) is the LFA filter kernel, and P(x, y) turns out to be the residual correlation of the outputs. To reconstruct the input image directly from the output {O(x)}, Eq. 8.2.14 can be rearranged to obtain Ar and substituted into the reconstruction formula (Eq. 8.2.4)

Equation 8.2.18


where the "inverse" kernel is

Equation 8.2.19


It can be shown that the reconstruction error | |IIrec| |2 for the LFA representation is exactly equal to that for the PCA representation [271].

Now that the filter and its inverse are designed, the next LFA step is to use only a few significant points on the output image to represent the whole output image, then use this representation to reconstruct the input. Penev and Atick [271] proposed a deterministic algorithm to incrementally produce (in no more than K steps) a "significant point set" M. The algorithm starts with an empty set M(0) and at each step adds a point to M, chosen according to the following criterion. Two things are required at the m-th step. First, given the current set M(m), an attempt is made to reconstruct O(x) by

Equation 8.2.20


If reconstructing O(x) is successful, then by Eq. 8.2.18 the sample image I(x) can be reconstructed. The optimal linear prediction coefficients am(x), chosen to minimize the average reconstruction mean-squared error on O(x)

Equation 8.2.21


are

Equation 8.2.22


Then, at the (m + 1)-st step the point that has the maximum reconstruction error Oerr(xm+1) is chosen and M is added to it. Points are added to M until the reconstruction error goes below some acceptable level or until K of them have been chosen. In Penev and Atick [271], satisfactory results can be reached at |M| = 64 when the total number of facial images in the ensemble is 1,039, the number of pixels per image is 3,840, and the number of PCA components is 400.

Three statistical techniques for template matching have been discussed: (1) PCA uses eigenvalue decomposition or singular value decomposition to find a significant low-dimension linear subspace, (2) FLD reduces the dimension by using the information of class discrimination, and (3) LFA takes advantage of the image's localized similarity. One crucial criterion to ensure the effectiveness of all of these methods is that all facial images, either in training sets or under testing, need to be properly aligned to a predefined "grid." A slight translational or rotational displacement can easily spread the energy to higher dimensions in the feature space and render PCA useless. The most commonly used facial features for image alignment are the eyes. The position of the eyes makes images invariant to translation and rotation. The nose and mouth can also be used for scale invariance. Section 8.4 discusses automatic algorithms to detect human faces and facial features from still images or video streams.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.156.93