Problem 3.9

Stochastic subspace tracking algorithms in signal processing and neural networks are often analyzed by studying the associated matrix differential equations. Such gradient-like nonlinear differential equations have an intricate convergence behavior that is reminiscent of matrix Riccati equations. In fact, these types of systems are closely related. We describe a number of open research problems concerning the dynamics of such flows for principal and minor component analysis.

1 DESCRIPTION OF THE PROBLEM

Principal component analysis is a widely used method in neural networks, signal processing, and statistics for extracting the dominant eigenvalues of the covariance matrix of a sequence of random vectors. In the literature, various algorithms for principal component and principal subspace analysis have been proposed along with some, but in many aspects incomplete, theoretical analyzes of them. The analysis is usually based on stochastic approximation techniques and commonly proceeds via the so-called Ordinary Differential Equation (ODE) method, i.e., by associating an ODE whose convergence properties reflect that of the stochastic algorithm; see e.g., [7]. In the sequel, we consider some of the relevant ODEs in more detail and pose some open problems concerning the dynamics of the flows.

In order to state our problems in precise mathematical terms, we give a formal definition of a principal and minor component flow.

Definition[PSA/MSA Flow]:A normalized subspace flow for a covariance matrix C is a matrix differential equation on with the following properties:

1. Solutions X(t) exist for all t 0 and have constant rank.

2. If X0 is orthonormal, then X(t) is orthornormal for all t.

3. exists for all full rank initial conditions X0.

4. is an orthonormal basis matrix of a p-dimensional eigenspace of C.

The subspace flow is called a PSA (principal subspace) or MSA (minor sub-space) flow, if, for generic initial conditions, the solutions X(t) converge for to an orthonormal basis of the eigenspace that is spanned by the first p dominant or minor eigenvectors of C, respectively.

In the neural network and signal processing literature, a number of such principal subspace flows have been considered. The best-known example of a PSA flow is Oja’s flow [9, 10]

Here C = C'> 0 is the n × n covariance matrix and X is an n × p matrix. Actually, it is nontrivial to prove that this cubic matrix differential equation is indeed a PSA in the above sense and thus, generically, converges to a dominant eigenspace basis. Another, more general example of a PSA flow is that introduced by [12, 13] and [17]:

Here N = N'> 0 denotes an arbitrary diagonal k × k matrix with distinct eigenvalues. This system is actually a joint generalization of Oja’s flow (1) and of Brockett’s [1] gradient flow on orthogonal matrices

For symmetric matrix diagonalisation, see also [6]. In [19], Oja’s flow was re-derived by first proposing the gradient flow

and then omitting the first term C(I - XX')X because C(I - XX')X = CX(I - X'X) 0, a consequence of both terms in (4) forcing X to the invariant manifold {X : XX = I}. Interestingly, it has recently been realized [8] that (4) has certain computational advantages compared with (1), however, a rigorous convergence theory is missing. Of course, these three systems are just prominent examples from a bigger list of potential PSA flows. One open problem in most of the current research is a lack of a full convergence theory, establishing pointwise convergence to the equilibria. In particular, a solution to the following three problems would be highly desirable. The first problem addresses the qualitative analysis of the flows.

Problem 1. Develop a complete phase portrait analysis of (1), (2) and (4). In particular, prove that the flows are PSA, determine the equilibria points, their local stability properties and the stable and unstable manifolds for the equilibrium points.

The previous systems are useful for principal component analysis, but they cannot be used immediately for minor component analysis. Of course, one possible approach might be to apply any of the above flows with C replaced by C^-1. Often this is not reasonable though, as in most applications the covariance matrix C is implemented by recursive estimates and one does not want to invert these recursive estimates on line. Another alternative could be to put a negative sign in front of the equations. But this does not work either, as the minor component equilibrium point remains unstable. In the literature, therefore, other approaches to minor component analysis have been proposed [2, 3, 5], but without a complete convergence theory.¹Moreover, a guiding geometric principle that allows for the systematic construction of minor component flows is missing. The key idea here seems to be an appropriate concept of duality between principal and minor component analysis.

Conjecture 1. Principal component flows are dual to minor component flows, via an involution in matrix space , that establishes a bijective correspondence between solutions of PSA flows and MSA flows, respectively. If a PSA flow is actually a gradient flow for a cost function f, as is the case for (1), (2) and (4), then the corresponding dual MSA flow is a gradient flow for the Legendre dual cost function f* of f.

When implementing these differential equations on a computer, suitable dis-cretizations are to be found. Since we are working in unconstrained Euclidean matrix space we consider Euler step discretizations. Thus, e.g., for system (1) consider

with suitably small step sizes. Such Euler discretization schemes are widely used in the literature, but usually without explicit step-size selections that guarantee, for generic initial conditions, convergence to the p dominant or-thonormal eigenvectors of A. A further challenge is to obtain step-size selections that achieve quadratic convergence rates (e.g., via a Newton-type approach).

Problem 2. Develop a systematic convergence theory for discretisations of the flows, by specifying step-size selections that imply global as well as local quadratic convergence to the equilibria.

2 MOTIVATION AND HISTORY

Eigenvalue computations are ubiquitous in Mathematics and Engineering Sciences. In applications, the matrices whose eigenvectors are to be found are often defined in a recursive way, thus demanding recursive computational methods for eigendecomposition. Subspace tracking algorithms are widely used in neural networks, regression analysis, and signal processing applications for this purpose. Subspace tracking algorithms can be studied by replacing the stochastic, recursive algorithm through an averaging procedure by a nonlinear ordinary differential equation. Similarly, new subspace tracking algorithms can be developed by starting with a suitable ordinary differential equation and then converting it to a stochastic approximation algorithm [7]. Therefore, understanding the dynamics of such flows is paramount to the continuing development of recursive eigendecomposition techniques.

The starting point for most of the current work in principal component analysis and subspace tracking has been Oja’s system from neural network theory. Using a simple Hebbian law for a single perceptron with a linear activation function, Oja [9, 10] proposed to update the weights according to

Here Xt denotes the n × p weight matrix and ut the input vector of the perceptron, respectively. By applying the ODE method to this system, Oja arrives at the differential equation (1). Here C = E is the covariance matrix. Similarly, the other flows, (2) and (4), have analogous interpretations.

In [9, 11] it is shown for p = 1 that (1) is a PSA flow, i.e., it converges for generic initial conditions to a normalised dominant eigenvector of C. In [11] the system (1) was studied for arbitrary values of p and it was conjectured that (1) is a PSA flow. This conjecture was first proven in [18], assuming positive definiteness of C. Moreover, in [18, 4], explicit initial conditions in terms of intersection dimensions for the dominant eigenspace with the inital subspace were given, such that the flow converges to a basis matrix of the p-dimensional dominant eigenspace. This is reminiscent of Schubert type conditions in Grassmann manifolds.

Although the Oja flow serves as a principal subspace method, it is not useful for principal component analysis because it does not converge in general to a basis of eigenvectors. Flows for principal component analysis such as (2) have been first studied in [14, 12, 13, 17]. However, pointwise convergence to the equilibria points was not established. In [16] a Lyapunov function for the Oja flow (1) was given, but without recognizing that (1) is actually a gradient flow. There have been confusing remarks in the literature claiming that (1) cannot be a gradient system as the linearization is not a symmetric matrix. However, this is due to a misunderstanding of the concept of a gradient. In [20] it is shown that (2), and in particular (1), is actually a gradient flow for the cost function and a suitable Riemannian metric on . Moreover, starting from any initial condition in , pointwise convergence of the solutions to a basis of k independent eigenvectors of A is shown together with a complete phase portrait analysis of the flow. First steps toward a phase portrait analysis of (4) are made in [8].

3 AVAILABLE RESULTS

In [12, 13, 17] the equilibrium points of (2) were computed together with a local stability analysis. Pointwise convergence of the system to the equilibria is established in [20] using an early result by Lojasiewicz on real analytic gradient flows. Thus these results together imply that (2), and hence (1), is a PSA. An analogous result for (4) is forthcoming; see [8] for first steps in this direction. Sufficient conditions for initial matrices in the Oja flow (1) to converge to a dominant subspace basis are given in [18, 4], but not for the other, unstable equilibria, nor for system (2). A complete characterization of the stable/unstable manifolds is currently unknown.

BIBLIOGRAPHY

[1] R. W. Brockett, “Dynamical systems that sort lists, diagonalise matrices, and solve linear programming problems, ” Linear Algebra Appl. , 146:79–91, 1991.

[2] T. Chen, “ Modified Oja’s algorithms for principal subspace and minor subspace extraction, ” Neural Processing Letters, 5:105–110, April 1997.

[3] T. Chen, S. Amari, and Q. Lin, “ A unified algorithm for principal and minor component extraction, ” Neural Networks, 11:385–390, 1998.

[4] T. Chen, Y. Hua, and W.-Y. Yan, “Global convergence of Oja’s subspace algorithm for principal component extraction, ” IEEE Transactions on Neural Networks, 9(1):58–67, 1998.

[5] S. C. Douglas, S.-Y. Kung, and S. Amari, “A self-stabilized minor subspace rule, ” IEEE Signal Processing Letters, 5(12):328–330, December 1998.

[6] U. Helmke and J. B. Moore, Optimization and Dynamical Systems, Springer-Verlag, 1994.

[7] H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications, Springer, 1997.

[8] J. H. Manton, I. M. Y. Mareels, and S. Attallah, “An analysis of the fast subspace tracking algorithm NOja, ” In: IEEE Conference on Acoustics, Speech and Signal Processing, Orlando, Florida, May 2002.

[9] E. Oja, “A simplified neuron model as a principal component analyzer, ” Journal of Mathematical Biology, 15:267–273, 1982.

[10] E. Oja, “ Neural networks, principal components, and subspaces, ” International Journal of Neural Systems, 1:61–68, 1989.

[11] E. Oja and J. Karhunen, “On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix, ” Journal of Mathematical Analysis and Applications, 106:69–84, 1985.

[12] E. Oja, H. Ogawa, and J. Wangviwattana, “Principal component analysis by homogeneous neural networks, Part I: The weighted subspace criterion, ” IEICE Transactions on Information and Systems, 3:366–375, 1992.

[13] E. Oja, H. Ogawa, and J. Wangviwattana, “Principal component analysis by homogeneous neural networks, Part II: Analysis and extensions of the learning algorithms, ” IEICE Transactions on Information and Systems, 3:376–382, 1992.

[14] T. D. Sanger, “Optimal unsupervised learning in a single-layer linear feedforward network, ” Neural Networks, 2:459–473, 1989.

[15] J. L. Willems, Stability Theory of Dynamical Systems, Studies in Dynamical Systems, London, Nelson, 1970.

[16] J. L. Wyatt and I. M. Elfadel, “Time-domain solutions of Oja’s equations, ” Neural Computation, 7:915–922, 1995.

[17] L. Xu, “Least mean square error recognition principle for self organizing neural nets, ” Neural Networks, 6:627–648, 1993.

18] W.-Y. Yan, U. Helmke, and J. B. Moore, “Global analysis of Oja’s flow for neural networks, ” IEEE Transactions on Neural Networks, 5(5):674– 683, September 1994.

[19] B. Yang, “Projection approximation subspace tracking”, IEEE Transactions on Signal Processing, 43(1):95–107, January 1995.

[20] S. Yoshizawa, U. Helmke, and K. Starkov, “Convergence analysis for principal component flows, ” International Journal of Applied Mathematics and Computer Science, 11:223–236, 2001.

1It is remarked that the convergence proof in [5] appears flawed; they argue that because However, counterexamples are known [15] where G(t) is strictly negative definite (with constant eigenvalues) yet Q diverges.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Problem 3.9

Create new playlist

Sign In

Sign Up

Table of Contents for
Problem 3.9