9.1. Introduction

A face recognition system identifies faces in images and videos automatically using computers. It has a wide range of applications, such as biometric authentication and surveillance, human-computer interaction, and multimedia management. A face recognition system generally consists of four processing parts, as depicted in Figure 9.1: face detection, face alignment, facial feature extraction, and face classification. Face detection provides information about the location and scale of each detected face. In the case of video, the found faces may be tracked. In face alignment, facial components, such as eyes, nose, and mouth, and facial outline are located, and thereby the input face image is normalized in geometry and photometry. In feature extraction, features useful for distinguishing between different persons are extracted from the normalized face. In face classification, the extracted feature vector of the input face is matched against those of enrolled faces in the database, outputting the identity of the face when a match is found with a sufficient confidence or as an unknown face otherwise.

Figure 9.1. Structure of a face recognition system.


The underlying problems can be treated using pattern recognition and machine-learning techniques. There two central issues: (1) what features to use to represent a face, and (2) how to classify a new face image based on the chosen representation. A capable face recognition system should be able to deal with variations of face images in viewpoint, illumination, expression, and so on.

The geometric feature-based approach [44, 57, 101, 17] is based on the traditional computer vision framework [81]. Facial features such as eyes, nose, mouth, and chin are detected. Properties and relations (e.g., areas, distances, angles) between the features are used as descriptors of faces for recognition. Using this approach, Kanade built the first face recognition system in the world [57]. Advantages include economy and efficiency in achieving data reduction and insensitivity to variations in illumination and viewpoint. Disadvantages are that facial feature detection and measurement techniques developed to date have not been reliable enough [25], and geometric information only is insufficient for face recognition.

Great progress has been made in the past 15 or so years due to advances in the template-matching or appearance-based approach [122]. Such an approach generally operates directly on an image-based representation (i.e., pixel intensity array). It extracts features (instead of overabstract features) in a face subspace from images. A face subspace is constructed to best represent the face object only. Although it is much less general, it is more efficient and effective for face recognition. In the eigenface [122] or principal component analysis (PCA) method, the face space is spanned by a number of eigenfaces [111] derived from a set of training face images by using Karhunen-Loeve transform or the PCA [42]. A face image is efficiently represented by a feature point (i.e., a vector of weights) of low (e.g., 40 or lower) dimensionality. Such subspace features are more salient and richer for recognition.

Face recognition performance has been much improved as compared to that of the first automatic face recognition system of Kanade [57]. Nowadays, face detection, facial feature location, and recognition can be performed for image data of reasonable conditions, which was unachievable by the pioneer systems.

Although the progress has been encouraging, the task has also turned out to be a very difficult endeavor [121]. Face recognition evaluation reports such as [95, 1] and other independent studies indicate that the performance of many state-of-the-art face recognition methods deteriorates with changes in lighting, pose, and other factors [123, 18, 140]. The key technical challenges are summarized in the following:

  • Immense variability of facial appearance: Whereas shape and reflectance are intrinsic properties of a face object, the appearance (i.e., the texture look) of a face is also subject to several other factors, including the facial pose (or, equivalently, camera viewpoint), the illumination, facial expression, and various imaging parameters such as aperture, exposure time, lens aberrations, and sensor spectral response. All these factors are confounded in the image data, so that "the variations between the images of the same face due to illumination and viewing direction are almost always larger than the image variation due to change in face identity" [89]. The complexity makes it very difficult to extract the intrinsic information of the face objects from their respective images.

  • Highly complex and nonlinear manifolds: As illustrated above, the face manifold, as opposed to the manifold of nonface patterns, is highly nonconvex, and so are face manifolds of any individual under changes due to pose, lighting, facial wear, and so on. As linear subspace methods, such as principal component analysis (PCA) [111, 122], independent component analysis (ICA) [10], or linear discriminant analysis (LDA) [12] project the data in a high-dimensional space, such as the image space, to a low-dimensional subspace in an optimal direction in a linear way, they are unable to preserve the nonconvex variations of face manifolds necessary to differentiate between different individuals. In a linear subspace, Euclidean distance and more generally M-distance, which are normally used in template matching, do not apply well to the problem of classification between manifolds of face/nonface manifolds and between manifolds of different individuals. This is a crucial fact that limits the power of the linear methods to achieve highly accurate face detection and recognition.

  • High dimensionality and small sample size: Another challenge is the ability to generalize. A canonical face example used in face recognition is an image of size 112 × 92 and resides in a 10, 304-dimensional real space. Nevertheless, the number of examples per person available for learning is usually much smaller than the dimensionality of the image space, e.g., less than 10 in most cases; a system trained on so few examples may not generalize well to unseen face instances. Besides, the computational cost caused by high dimensionality is also a concern for real-time systems.

The above problems can be handled in two ways. One is to normalize face images in geometry and photometry. This way, the face manifolds become simpler (less nonlinear and less nonconvex), so that the complex problems become easier to tackle. The other way is to devise powerful engines able to perform difficult nonlinear classification and regression and to generalize better. This relies on advances in pattern recognition and learning, and clever applications of them.

Both prior and observation constraints are needed for such powerful engines. Most successful approaches for tackling the above difficulties are based on subspace modeling of facial appearance and statistical learning. Constraints about the face include facial shape, texture, head pose, and illumination effect. Recent advances allow these to be effectively encoded into the system by learning from training data.

This chapter presents advanced techniques for face detection, face alignment, and face recognition (feature extraction and matching). The presentation is focused on appearance- and learning-based approaches.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.77.195