2.4. Adaptive Classifiers

If individual patterns are available, one can build a statistical model for each class of person. A common approach is to model each class by a normal density so that the system estimates the corresponding mean feature vector and covariance matrix for each person. Using a prior distribution of the individuals in the database, the classification task is completed by computing the Bayesian a posteriori probability of each person, conditioned on the observations of the query. If log probability is computed, the classification process can be considered a nearest-neighbor search using the Mahalanobis distance metric.

There are other statistical approaches to pattern classification. First, the K-nearest-neighbor algorithm [85] determines the class of a test pattern by comparing it with the K-nearest training patterns of known classes. The likelihood of each class is estimated by its relative frequency among the K-nearest training patterns. As training pattern size grows, these relative frequencies converge to the true posterior class probability. Second, the Parzen windows [270] estimate the class-conditional densities via a linear superposition of window functions—one for each training pattern. The window function is required to be unimodal and has a unit area under its curve. As training pattern size grows, the linear superposition of window functions for a given class converges to the true class-conditional density.

2.4.1. Neural Networks

The ability to learn and adapt is a fundamental trait of any intelligent information processing system, including machine learning models and neural networks. Although the original development of neural networks was primarily motivated by the function of the human brain, modern neural models encompass many similarities with statistical pattern recognition. A neural network is an abstract simulation of a real nervous system that contains a collection of neuron units communicating with one other via axon connections.

The first fundamental neural model was proposed in 1943 by McCulloch and Pitts [237] in terms of a computational model of "nervous activity." The McCulloch-Pitts neuron is a binary device and each neuron has a fixed threshold, thus performing simple threshold logic. The McCulloch-Pitts model led to the work of John von Neumann, Marvin Minsky, Frank Rosenblatt, and many others. Hebb postulated in his classical book The Organization of Behavior [130] that neurons are appropriately interconnected by self-organization and that "an existing pathway strengthens the connections between the neurons." He proposed that the connectivity of the brain is continually changing as an organism learns various functional tasks and that cell assemblies are created by such changes [64]. By embedding a vast number of simple neurons in an interacting nervous system, it is possible to provide computational power for very sophisticated information processing [8].

Several promising neural networks for biometric identification applications are explored in Chapters 5, 6, and 7. Their application to face recognition, speaker verification, and joint audio-visual biometric authentication are investigated in Chapters 8, 9, and 10. In addition to biometric identification applications, neural networks have been applied to a variety of problems such as pattern classification, clustering, function approximation, prediction/forecasting, optimization, pattern completion, and control [128,186].

2.4.2. Training Strategies

Instead of following a set of rules specified by human experts, neural networks appear to learn the underlying rules from a given collection of representative examples. The ability of neural networks to automatically learn from examples makes them attractive and, in general, is considered to be one of the major advantages of neural networks over traditional expert systems.

Based on the nature of the training data sets made available to the designers of biometric authentication systems, there are two common categories of learning schemes: supervised learning and unsupervised learning.

  1. In supervised learning schemes, the neural network is provided with a correct answer (the "teacher value") for each input pattern. The weighting parameters are determined so that the network can produce answers as close as possible to the teacher values. In many classification applications (e.g., OCR and speaker recognition), the training data consist of pairs of input/output patterns. In this case, adopting a supervised network is advantageous.

  2. In contrast, unsupervised learning schemes do not require that a correct answer be associated with each input pattern in the training data set. These schemes explore the underlying correlations between patterns in the data, and patterns are organized into categories according to their correlations. Unsupervised networks have been widely used in applications in which teacher values are not easy to obtain or generate, such as video coding and medical image segmentation.

2.4.3. Criteria on Classifiers

For biometric identification applications, key performance measurements of adaptive learning classifiers include the following.

  • Training and generalization performance. Two popular metrics to measure the accuracy of a learning algorithm are training accuracy and generalization accuracy. For training accuracy, the test patterns are simply samples drawn from the original training patterns, namely memorization. In contrast, for generalization accuracy, test patterns are drawn from an independent data set instead of the original training pattern set. In short, the distinction between training and generalization accuracies lies in the test patterns adopted. There is a natural tradeoff between training and generalization accuracies. In other words, high training accuracy does not necessarily yield good generalization accuracy.

  • Invariance and noise resilience. From the invariance perspective, any dependence on environmental conditions should be minimized. For example, in terms of face recognition, the classifier should be made invariant with respect to any geometric transformations, such as coordinate-shift and/or orientation. In addition, the system should be able to tolerate noise corruption of images or speeches because noise is basically inevitable in practical biometric applications.

  • Cost-effective system implementation. It is important to provide low-cost, high-speed, flexible implementation for adaptive classifiers, thus a cost-effective hardware/software design platform should be considered. To fully harness the advanced VLSI/ULSI and system-on-chip technologies, a processor architecture with distributed and modular structure appears to be the most promising. Since compatibility between feature extractor and adaptive classifier could heavily influence total system performance, emphasis should also be placed on system integration design issues.

2.4.4. Availability of Training Samples

If the chosen biometric identification (BI) algorithms require learning or parameter estimation, then the availability of reliable and representative training data is also of critical concern. Training data size differs greatly from one application to another. If the number of training examples is too small or the number of ϕ functions (the hidden neurons) is too large, the BI algorithm can generate a BI model with spurious decision boundaries or poor interpolation. In other words, the resulting BI model may behave just like a lookup table constructed from the training patterns, and it may possess little generalization capability.

Many learning algorithms have mechanisms to control the number of hidden functions (i.e., M and P in Eq. 2.2.2) to prevent the degradation of generalization accuracy. An example of this mechanism is the regularization term in some neural network models [27]. However, for some applications (e.g., criminal identification), where training patterns are difficult to acquire (perhaps one or two fingerprint records or photos for each person), the performance of learning algorithms could be greatly degraded because the training-by-examples principle has been violated.

There are two possible solutions to this training example deficiency problem. The first is to perform an intense and thorough analytical study on the nature of the input biometric signals and on the noise models embedded in the data acquisition process. Once knowledge of the true data generative process and noise models becomes available, it will be possible to design a powerful, yet nonlearning feature extraction algorithm, the resulting feature vectors of which are easily separable by simple classifiers (e.g., nearest-neighbor rule). One example of this approach is used in fingerprint problems. For almost 100 years, it has been known that the relative positions between various minutiae are the discriminative features for identifying people [135]. Therefore, there is no need to use examples to tell the system which features should be extracted from sensory images. Most fingerprint researchers detect the locations and orientation of useful minutiae by designing filters using image processing techniques (e.g., edge enhancement and thinning) and by applying graph matching techniques to determine the similarity between the test pattern (a set of points) and the reference pattern. Both feature extraction and pattern classification algorithms can achieve a satisfactory identification rate without applying learning procedures.

Another solution to training data deficiency is to apply the virtual pattern generation procedure. To ensure sufficient diversity in the training set, the algorithm should take the acquired data from sensors and transform it to create additional training exemplars (i.e., virtual training patterns). The transformations should mimic the variations that might be embedded in the data acquisition process. One example of virtual training pattern generation can be seen in the face recognition experiments (see Chapter 8), where up to 200 virtual training patterns are generated from one original facial image by applying various affine transformations (e.g., rotation, scaling, shifting) and a mirroring process. Robustness of the trained network can be improved via the use of a virtual training set.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.232.11