Biometric Authentication Application Examples

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

5.7. Biometric Authentication Application Examples

Note that most biometric data can be adequately represented by a mixture of Gaussians, and also that RBF neural networks adopt basically the same type of Gaussian kernels. Therefore, RBF networks are more appealing for biometric applications than LBF networks. In fact, RBF and EBF networks have been used extensively in biometric applications. For instance, Brunelli and Poggio [38] used a special type of RBF network, called the "HyperBF" network, for face recognition and reported a 100% recognition rate on a 47-person database. In a speaker verification task, Mak and Kung [225] demonstrated that EBF networks perform substantially better than their RBF counterparts; and in a speaker identification task, Mak et al. [221, 222] found that RBF networks are superior to LBF networks in terms of both identification accuracy and training time.

In Er et al. [87], RBF neural networks are trained by a hybrid learning algorithm to drastically reduce the dimension of the search space in the gradient paradigm. On the widely used Olivetti Research Laboratory (ORL) face database, the proposed system reportedly achieved excellent performance in terms of both learning efficiency and classification error rates. In Zhang and Guo [401], a modular network structure based on an autoassociative RBF module is applied to face recognition applications based on both the XM2VTS and ORL face database. The scheme achieves highly accurate recognition on both databases. RBF neural networks have also been implemented on embedded systems to provide realtime processing speed for face tracking and identity verification [391].

In Cardinaux et al. [46], MLPs and GMMs were compared in the context of face verification. The comparison is carried out in terms of performance, effectiveness, and practicability. Apart from structural differences, the two approaches use different training criteria; the MLP approach uses a discriminative criterion, while the GMM approach uses a combination of maximum likelihood (ML) and maximum a posteriori (MAP) criteria. Experiments on the XM2VTS database have shown that for low-resolution faces the MLP approach has slightly lower error rates than the GMM approach; however, the GMM approach easily outperforms the MLP approach for high-resolution faces and is significantly more effective for imperfectly located faces. The experiments have also shown that computational requirements of the GMM approach can be significantly smaller than those of the MLP approach, with only a slight decrease in performance.

RBF networks have also been used for speaker verification. For example, in Jiang and Deng [165], a Bayesian approach is applied to speaker verification based on the Gaussian mixture model (GMM). In the Bayesian approach, the verification decision is made by comparing Bayes factors against a critical threshold. An efficient algorithm, using the Viterbi approximation, was proposed to calculate the Bayes's factors for the GMM. According to the reported simulation based on the NIST98 speaker verification evaluation data, the proposed Bayesian approach achieves moderate improvements over a well-trained baseline system using the conventional likelihood ratio test.

A close relative of the RBF clustering scheme is the family of vector quantization (VQ) clustering, since they are both based on the distance metric. In He et al. [129], an evaluation experiment was conducted to compare the codebooks trained by the Linde-Buzo-Grey (LBG), the learning vector quantization (LVQ), and the group vector quantization (GVQ) algorithms. (In GVQ training, speaker codebooks are trained and optimized for vector groups rather than for individual vectors.)

Many examples of applying GAs to face detection and recognition have been reported (e.g., references [15, 155, 156, 180, 275]). A face recognition method using support vector machines with the feature set extracted by genetic algorithms was proposed in Lee et al. [199]. By selecting the feature set with superior performance in face recognition, the use of unnecessary facial information can be avoided and memory requirements can be significantly reduced.

Problems

Consider the classical XOR problem defined in Eq. 5.3.4. Show that
1. the XOR data can be separated by the following elliptic decision boundary:
2. they are separable by an even simpler hyperbolic decision boundary:
Verify your answers by using the Matlab function ezplot.m.
This problem again concerns the XOR problem:
1. Explain why multi-layer networks with nonlinear RBF kernels can be used for separating XOR data.
2. Repeat the same problem for multi-layer networks with nonlinear LBF kernels.
Compare multi-layer perceptrons and radial basis function networks in the context of
1. the number of possible hidden-layers
2. the representation of the hidden nodes' activation
3. the training time required
Given N input patterns, the output of a radial basis function network with D inputs, M hidden nodes, one bias term, and one output can be expressed as

where y = [y₁, y₂, …, y_N]^T, w = [ω₀,ω ₁, …, ω_M]^T, and Φ is an (N x M + 1) matrix with each row containing the hidden nodes' outputs. Assume that N = 3 and M = 2. Assume also that the desired output vector is

Find the least-squares solution of w if the singular value decomposition of Φ is given by Φ = USV^T, where

and S = diag{2.25, 0.80, 0.56}. Hence, find the least-squares error
When only a few training patterns are available, using SVD to determine the output weight matrix W of an RBF network is recommended (see Section 5.5). Explain why, if the amount of training data is large, it may not be necessary to use SVD. Instead, the normal equation

of the linear least-squares problem

will be sufficient to find the output weights. In other words, show that the larger the training set, the more numerically stable the normal equation is. Equivalently, show that , where ) is the i-th singular value for a data collection with N (resp. N + 1) samples. Hint: If

where all matrices are real matrices, show that

Since B^TB is positive semidefinite, therefore, the eigenvalues of C^T C are no smaller than that of A^T A [116]. Equivalently, the singular values of C are no smaller than that of A. (For a numerical example, see Problem 6.)
Matlab Exercise. It is well known that the SVD approach to solving least-squares problems requires a great deal more computation than using the normal equation. Therefore, SVD should only be used in a critical situation (i.e., if it can make a difference in numerical behavior). The following Matlab code shows that the more training patterns that are available, the more stable is a normal equation's numerical behavior. Therefore, there is less need for the computationally complex SVD algorithm. Verify this statement by using Matlab and varying the vectors a, b, c, and d. (See Figure 5.13.)
```
clear all; close all;
a=[5 6]'; b=[-3 4]'; c=[4 4]'; d=[-l -4]';
A1 = [a*a']
svd(A1)
plot(svd(A1),'k'),
axis([l 2 0 120]);
hold;
A2 = a*a' + b*b'
svd(A2)
plot(svd(A2),'k--'),
A3 = a*a' + b*b'+ c*c'
svd(A3)
plot(svd(A3),'k-.'),
A4 = a*a' + b*b'+ c*c' + d*d'
svd(A4)
plot(svd(A4),'k:'),
```
Figure 5.13. Results of the Matlab code shown in Problem 6. The figure shows the increases in the singular values when positive semidefinite matrices are added to a positive semidefinite matrix. The tick marks at the horizontal axis represent the two singular values (λ_l and λ₂) and the vertical axis their numerical values. Solid: λ_i of A₁; dashed: λ_i of A₁ + A₂; solid dashed: λ_i of A₁ + A₂ + A₃; and dotted: λ_i of A₁ + A₂ + A₃ + A₄.

Matlab Exercise. The following Matlab code compares the numerical performance and stability of SVD and the normal equation for finding a least-squares solution.

clear all; close all; echo off;
disp('Without dependent columns of A, press any key to start'),
pause;
A=[1 2; 3 4; 5 6]; d=[-3 2 3]';
w=inv(A'*A)*A'*d;
y=A*w
E=(d-y)'*(d-y)
disp('Poor precision in w, press any key to start') ;pause;
w1 = w + [0.01 -0.01]'*w(1);
y1=A*w1
E1=(d-y)'*(d-y)
disp('With dependent columns of A, press any key to start'),
pause;
A=[1 2; 2 4; 3.0000000001 6]; d=[-3 2 3]';
w=inv(A'*A)*A'*d;
y=A*w
E=(d-y)'*(d-y)
disp('Poor precision in w, press any key to start'),pause;
w1 = w + [0.01 -0.01]'*w(1);
y1=A*w1
E=(d-y1)'*(d-y1)
disp('Solve the normal eq. by SVD, press any key to start'),
pause;
[U,S,V]=svd(A);
invS = S;
for i=1:2,
  if(abs(S(i,i))< 0.0001)
    invS(i,i) = 0.0;
  else
    invS(i,i) = 1/S(i,i);
  end
end
w=V*invS'*U'*d;
y=A*w
E=(d-y)'*(d-y)
disp('Poor precision in w, press any key to start'),pause;
w1 = w + [0.01 -0.01]'*w(1);
y1=A*w1
E=(d-y1)'*(d-y1)
disp('Ignore the second dimension of input data,...
      press any key to start'),pause;
A=[1 ; 2 ; 3.0000000001]; d=[-3 2 3]';
w=inv(A'*A)*A'*d;
y=A*w
E=(d-y)'*(d-y)

Explain why the error E is large when there are dependent columns in matrix A and the precision of w is poor.
Discuss the geometrical interpretation of having a singular value of 0 in matrix S. Hence, explain how the singular value decomposition approach helps avoid the numerical difficulty arising from the singularity of A^TA.

An elliptical basis function (EBF) network with M centers, K outputs, and no bias term has the form

where ω_kj (k = 1, …, K and j = 1, …, M) are the output weights and φ_j(x) is the j-th basis function output. An M-center GMM has the form

where are the GMM's parameters and p_j(·) is the j-th hidden node output. Compare the EBF network and the GMM in the context of their
1. hidden unit representation
2. output representation
3. training algorithm
A fully connected feed-forward multi-layer perceptron network has eight inputs, four outputs, and two hidden layers, one with six hidden nodes and the other with three nodes.
1. Draw a flow graph of this network.
2. For the retrieving phase, what is the computational complexity in terms of MACs (multiplier-accumulations)?
A radial basis function (RBF) network with 10 function centers is applied to approximate the function

shown in Figure 5.14. Suggest appropriate positions for the function centers and explain your suggestions. Verify your answer by using Matlab.
Figure 5.14. Function to be approximated by an RBF network.
What are the advantage(s) and disadvantage(s) of using elliptical basis function networks instead of radial basis function networks for pattern classification?
The k-th output of an RBF network with K outputs has the form

For the case of Gaussian basis functions, there is either

or

where and ∑_j are the mean vectors and the covariance matrices, respectively, and σ_j is the function width associated with the j-th cluster.
1. Give the conditions under which the preceding two equations are equivalent.
2. If u_k and λ_k (k = 1, …, M) are the orthonormal eigenvectors and eigenvalues of the matrix , respectively, that is,
  
  show that
  
  where
The outputs of an EBF network with I inputs, J hidden nodes, and K outputs are given by

where
1. Let the total squared error be
  
  where is the desired output for the input vector . Derive expressions for
2. Hence suggest a gradient-based learning rule for the EBF network.
3. Suggest a method to ensure that the function widths σ_ji are always positive.
4. Identify two disadvantages of the gradient-based learning rule derived in Problems 13b and 13c.
Assume that an RBF network with I inputs, J hidden nodes, and one output has a Gaussian-basis function of the form

where c_j and σ_j are the j-th function center and function width, respectively.
1. Express the output y(x_p) in terms of ω_j, x_pi, c_ji, and σ_j, where 1 ≤ i ≤ I, 0 ≤ j ≤ J, 1 ≤ p ≤ N, and N is the number of training pairs {x_p, d(x_p)}. Note that x_pi is the i-th input of the p-th pattern and c_ji is the position of the j-th center with respect to the i-th coordinate of the input domain.
2. Show that the total squared error is
  
  where
  
  Hence, show that
3. State the condition under which (Z^TZ)^TZ^Td is susceptible to numerical error.
An RBF network has n inputs, three hidden nodes, and one output. The output of the network is given by

where are the activation of the hidden nodes and are the weights connecting the hidden nodes and the output node. Table 5.8 shows the relationship among φ_i(x_p), y(x_p) and the desired output d(x_p) for input vectors .
1. Determine the matrix A such that Aw = y, where
2. Express the vector w* that minimizes the error
  
  in terms of A, w, and d. Explain whether you can determine a numerical value of w* by using your expression.
3. If the singular decomposition of A is equal to USV^T, where
  
  suggest a least-squares solution for w*.
Table 5.8.
Input Vector φ₁(x) φ₂(X) φ₃(x) d(x_p) Actual Output
x₁ 0 0 0 1 y(x_i)
x₂ 0 0 0 1 y(x₂)
X₃ 1 1 1 0 y(x₃)
The probability density function of the k-th class is defined as p(x|C_k), where x denotes the vector in the feature space and C_k represents the k-th class.
1. Use Bayes's theorem to express the posterior probability p(C_k|x) in terms of p(x|C_k) and p(x), where p(x) = ∑_ip(x|C_i)P(C_i) is the probability density function of x.
2. Given that it is possible to use a common pool of M-basis functions , labeled by an index j, to represent all of the class-conditional density; that is,
  
  show that .
3. Hence, show that the basis function outputs and the output weights of an RBF network can be considered as posterior probabilities.
The posterior probability of class C_k in a K-class problem is given by

where is a GMM, P(C_k) is the prior probability of class C_k, and p(x) is the density function of x.
1. Express p(x) in terms of p(x|C_k) and P(C_k).
2. Discuss the difference between P(C_k) and π_kj.
3. If P(C_k) ≠ P(C_j) ∀k ≠ j, explain how you would use K GMMs to classify a set of unknown pattern X = {x₁, x₂, …, X_T} into K classes.
The BP algorithm involves the computation of the gradient , where E is the mean-squared error between the desired outputs and the actual outputs and w_ij(l) is the weights at layer l.
1. Based on the linear basis function in Eq. 5.2.1 and the BP algorithm (Eq. 5.4.3–Eq. 5.4.5), derive an algorithm for computing the error sensitivity of a multi-layer LBF network with respect to the network inputs x_i, i = 1, ..., D, where D is the input dimension.
2. Repeat (a) for a multi-layer RBF network using Eq. 5.2.2 and the back-propagation rule for multi-layer RBF networks (Eq. 5.4.17-Eq. 5.4.19).
Derive the gradient of a Gaussian EBF in Eq. 5.4.10.
Assume that you are given a set of data with an XOR-type distribution: For one class there are two centroids located at (0,0) and (1,1), while for the other class there are two centroids located at (1,0) and (0,1). Each cluster can be statistically modeled by a uniform distribution within a disk of radius r = 1.
1. Show that the two classes are not linearly separable.
2. Derive an analytical solution (equations for the decision boundaries) for a multi-layer LBF network (no computer).
3. Verify your solutions using Matlab and Netlab.
The implementation cost of multi-layer neural models depends very much on the computational complexity incurred in each layer of feed-forward processing. Compute the number of multiplications and additions involved when a 3 x 4 matrix is postmultiplied by a 4 x 12 matrix.
Is it true that the basis function of kernel-based networks must be in the form of a Gaussian function? Provide an explanation. Note: RBF networks are considered to be one type of kernel-based networks.
Multiple Choice Questions:
1. The outputs of an RBF network are linear with respect to
  the inputs
  the weights connecting the hidden nodes and the outputs
  parameters of the hidden nodes
  the outputs of the hidden nodes
  both (ii) and (iv)
  none of the above
2. The purpose of the function width σ in an RBF network is to
  improve the numerical stability of the K-means algorithm
  allow the basis functions to cover a wider area of the input space so that better generalization can be achieved
  ensure that the inference of the individual-basis functions will not overlap with each other
  none of the above
3. To train an RBF network, one should apply the K-means, least-squares, and K-nearest-neighbor algorithms in the following order:
  K-means, least squares, K-nearest neighbor
  K-means, K-nearest-neighbor, least squares
  Least squares, K-means, K-nearest neighbor
  Least squares, K-means, and K-nearest-neighbor in any order
4. To apply singular value decomposition to determine the output weights of an RBF network,
  decompose matrix A into A = UAV^T, where A = {a_pi} = {Φ_i (||x_p – c_i||)} contains the basis function outputs and Λ = diag {λ₁, λ₂, ..., λ_d} contains the singular values
  set 1/λ_j = 0 if λ_j → 0
  set λ_j = 0 if λ_j → ∞
  both (i) and (ii)
  both (i) and (iii)
5. Given the decision boundaries and the contours of basis function outputs of Figure 5.15, it is correct to say that
  The figure is produced by an RBF network with 2 centers.
  The figure is produced by an RBF network with 4 centers.
  The figure is produced by an EBF network with 4 centers.
  The figure is produced by an EBF network with 2 centers.
  Figure 5.15. Decision boundaries (thick curves) and contours of basis function outputs (thin ellipses).
6. Based on the figure in the preceding question, it is correct to say that
  The covariance matrix corresponding to Class 1 (4th quadrant) is diagonal.
  The covariance matrix corresponding to Class 2 (3rd quadrant) is not diagonal.
  The covariance matrix corresponding to Class 1 (2nd quadrant) exhibits a larger variance along the horizontal axis.
  All of the above.
7. In the context of pattern classification, RBF outputs can be trained to give
  the posterior probabilities of class membership, P(C_k|x)
  the likelihood of observing the input data given the class membership, p(x|C_k)
  the likelihood of observing the input data, p(x)
  both (i) and (ii)
  (i), (ii), and (iii)
8. Training of RBF networks is usually faster than training multi-layer perceptrons because
  Training in RBF networks is divided into two stages, where each stage focuses on a part of the optimization problem.
  RBF networks have fewer parameters to be estimated.
  The RBF networks' outputs are linear with respect to all of the networks' parameters.
  None of the above.
9. Training of RBF networks requires a large amount of memory because
  The K-means algorithm requires a great deal of memory.
  The singular value decomposition requires to store an N x (M + 1) matrix in memory, where N is the number of patterns and M is the number of centers.
  The size of RBF networks is usually considerably larger than that of MLPs.
  None of the above.
Answers: a (v), b (ii), c (ii), d (iv), e (iii), f (iv), g (i), h (i), i (ii)
Matlab Exercise. Given two classes of patterns with distributions p₁(x, y) and P₂(x, y), where for p₁ the mean and variance are, respectively,

and for p₂ the mean and variance are

write Matlab code to generate 1,000 patterns for each distribution. Create an RBF network and a GMM-based classifier to classify the data in these two classes. For each network type, find the classification accuracy and plot the decision boundary.
Matlab Exercise. Create 100 two-dimensional data patterns for two classes. Use an RBF BP Network to separate the two classes. Try to do clustering by using
1. an RBF BP network with four hidden units
2. an EBF BP network with four hidden units