Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Generalized inverse of matrix and solution of linear system equation

Abstract

This chapter presents a brief introduction to the generalized inverse of matrix, which is needed in the following expositions. This introduction includes the left inverse and right inverse, the Moore-Penrose inverse, the minimization approach to solve an algebraic matrix equation, the full rank decomposition theorem, the least square solution to an algebraic matrix equation, and the singular value decomposition.

Keywords

Generalized inverse of matrix; Left inverse; Right inverse; Moore-Penrose inverse; Full rank decomposition theorem; Least square solution to an algebraic matrix equation; Singular value decomposition

In this chapter, we briefly present some background for generalized inverse of matrix and its relation to solution of linear system equation, that is needed in the sequel. This chapter can be skimmed very quickly and used mainly as a quick reference. There have been a huge number of reference resources for this topic, to mention a few [43, 44, 51–54].

3.1 The generalized inverse of matrix

Recall how we defined the matrix inverse. A matrix inverse A⁻¹ is defined as a matrix that produces identity matrix when we multiply with the original matrix A, that is, we define AA⁻¹ = I = A⁻¹A. Matrix inverse exists only for square matrices.

Real world data are not always square. Furthermore, real world data are not always consistent and might contain many repetitions. To deal with real world data, generalized inverse for rectangular matrix is needed.

Generalized inverse matrix is defined as AA⁻A = A. Notice that the usual matrix inverse is covered by this definition because AA⁻¹A = A. We use the term “generalized” inverse for a general rectangular matrix and to distinguish it from the inverse matrix that is for a square matrix. Generalized inverse is also called the pseudo-inverse.

Unfortunately there are many types of generalized inverse. Most generalized inverses are not unique. Some generalized inverses are reflexive satisfying (A⁻)⁻ = A and some are not reflexive. In this tutorial, we will only discuss a few of them that are often used in practical applications.

• A reflexive generalized inverse is defined as

$\begin{array}{l} A A^{-} A = A, \\ A^{-} A A^{-} = A^{-} . \end{array}$ $\begin{array}{l} A A^{-} A = A, \\ A^{-} A A^{-} = A^{-} . \end{array}$

si1_e

• A minimum norm generalized inverse, such that x = A⁻b will minimize ∥x∥ is defined as

$\begin{array}{l} A A^{-} A = A, \\ A^{-} A A^{-} = A^{-}, \\ {(A^{-} A)}^{-} = A^{-} A . \end{array}$ $\begin{array}{l} A A^{-} A = A, \\ A^{-} A A^{-} = A^{-}, \\ {(A^{-} A)}^{-} = A^{-} A . \end{array}$

si2_e

• A generalized inverse that produces the least square solution that will minimize the residual or error ${min}_{x} ∥ b - A x ∥$ ${min}_{x} ∥ b - A x ∥$ , is defined as

$\begin{array}{l} A A^{-} A = A, \\ {(A A^{-})}^{T} = A A^{-} . \end{array}$ $\begin{array}{l} A A^{-} A = A, \\ {(A A^{-})}^{T} = A A^{-} . \end{array}$

si4_e

3.1.1 The left inverse and right inverse

The usual matrix inverse is defined as a two-side inverse, i.e., AA⁻¹ = I = A⁻¹A because we can multiply the inverse matrix from the left or from the right of matrix A and we still get the identity matrix. This property is only true for a square matrix A.

For a rectangular matrix A, we may have a generalized left inverse or left inverse for short when we multiply the inverse from the left to get identity matrix A_left⁻¹A = I. Similarly, we may have a generalized right inverse, or right inverse for short, when we multiply the inverse from the right to get the identity matrix AA_right = I.

In general, the left inverse is not equal to the right inverse. The generalized inverse of a rectangular matrix is related to the solving of system linear equations Ax = b. The solution to a normal equation is x = (A^TA)⁻¹A^Tb, which is equal to x = A⁻b. The term

$\begin{array}{l} A^{-} = A_{left}^{- 1} = {(A^{T} A)}^{- 1} A^{T} \end{array}$ $\begin{array}{l} A^{-} = A_{left}^{- 1} = {(A^{T} A)}^{- 1} A^{T} \end{array}$

is often called as generalized left inverse. Yet another pseudo-inverse can also be obtained by multiplying the transpose matrix from the right and this is called a generalized right inverse

$\begin{array}{l} A^{-} = A_{right}^{- 1} = A^{T} {(A A^{T})}^{- 1} . \end{array}$ $\begin{array}{l} A^{-} = A_{right}^{- 1} = A^{T} {(A A^{T})}^{- 1} . \end{array}$

3.1.2 Moore-Penrose inverse

It is possible to obtain a unique generalized matrix. To distinguish the unique generalized inverse from other nonunique generalized inverses A⁻, we use the symbol A⁺. The unique generalized inverse is called the Moore-Penrose inverse. It is defined using the following four conditions:

(1) AA⁺A = A,

(2) A⁺AA⁺ = A⁺,

(3) (AA⁺)^T = AA⁺,

(4) (A⁺A)^T = A⁺A.

The first condition AA⁺A = A is the definition of a generalized inverse. Together with the first condition, the second condition indicates the generalized inverse is reflexive (A⁻)⁻ = A. Together with the first condition, the third condition indicates that the generalized inverse is the least square solution that will minimize the norm of error ${min}_{x} ∥ b - A x ∥ .$ ${min}_{x} ∥ b - A x ∥ .$ The fourth condition above demonstrates the unique generalized inverse.

Properties of generalized inverse of matrix: Some important properties of generalized inverse of matrix are:

1. The transpose of the left inverse of A is the right inverse A_right⁻¹ = (A_left⁻¹)^T. Similarly, the transpose of the right inverse of A is the left inverse A_left⁻¹ = (A_right⁻¹)^T.

2. A matrix A_m×n has a left inverse A_left⁻¹ if and only if its rank equals its number of columns and the number of rows is more than the number of columns ρ(A) = n < m. In this case A⁺A = A_left⁻¹A = I.

3. A matrix A_m×n has a right inverse A_right⁻¹ if and only if its rank equals its number of rows and the number of rows is less than the number of columns ρ(A) = m < n. In this case A⁺A = AA_right⁻¹ = I.

4. The Moore-Penrose inverse is equal to left inverse A⁺ = A_left⁻¹, when ρ(A) = n < m and equals the right inverse A⁺ = A_right⁻¹, when ρ(A) = m < n. The Moore-Penrose inverse is equal to the matrix inverse A⁺ = A⁻¹, when ρ(A) = m = n.

3.1.3 The minimization approach to solve an algebraic matrix equation

The generalized inverse of the matrix has been used extensively in the areas of modern control, least square estimation and aircraft structural analysis. It is the purpose of this note to extend the results by presenting a unified framework that provides geometric insight and highlights certain optimal features imbedded in the generalized inverse.

Consider the algebraic matrix equation

$\begin{array}{l} y = A x, \end{array}$ $\begin{array}{l} y = A x, \end{array}$

(3.1)

where A is an n × m constant matrix, y is a given n vector, and x is an m vector to be determined. For the trivial case where n = m and A is a nonsingular matrix, i.e., rank(A) = n, a unique solution to Eq. (3.1) exists and is given by

$\begin{array}{l} x = A^{- 1} y, \end{array}$ $\begin{array}{l} x = A^{- 1} y, \end{array}$

(3.2)

where A⁻¹ denotes the inverse of A.

For the case n≠m, the expression of x in terms of y involves the generalized inverse of A, denoted A⁺, and thus

$\begin{array}{l} x = A^{+} y . \end{array}$ $\begin{array}{l} x = A^{+} y . \end{array}$

(3.3)

In the following cases it will be shown that A⁺, for either n > m or n < m, may be viewed as a solution to a certain minimization problem.

(1) Case A: n > m

With no loss of generality it can be assumed that A is of full rank, i.e.,

$\begin{array}{l} rank (A) = m \end{array}$ $\begin{array}{l} rank (A) = m \end{array}$

(3.4)

however, rank (A) < n. It is possible to delete the dependent columns of A, set the respective unknown components of x equal to zero, and reduce the problem to the case in which Eq. (3.4) is satisfied. Since the m columns of A do not span the n dimensional space R_n, an exact solution to Eq. (3.1) cannot be obtained if y is not contained in the subspace spanned by the columns of A. Thus, one is motivated to seek approximate solutions, the best of which is the one that minimizes the Euclidian norm of the error. Let the error e be given by

$\begin{array}{l} e = y - A x . \end{array}$ $\begin{array}{l} e = y - A x . \end{array}$

(3.5)

Then let z be given by

$\begin{array}{l} z = ∥ e ∥^{2} = e^{T} e = {(y - A x)}^{T} (y - A x), \end{array}$ $\begin{array}{l} z = ∥ e ∥^{2} = e^{T} e = {(y - A x)}^{T} (y - A x), \end{array}$

(3.6)

where the superscript “T” denotes the transpose. Evaluation of the gradient of z with respect to x yields

$\begin{array}{l} \frac{\partial z}{\partial x} = - 2 A^{T} y + 2 A^{T} A x = 0 . \end{array}$ $\begin{array}{l} \frac{\partial z}{\partial x} = - 2 A^{T} y + 2 A^{T} A x = 0 . \end{array}$

(3.7)

The Hessian matrix is

$\begin{array}{l} \frac{\partial^{2} z}{\partial x^{2}} = 2 A^{T} A . \end{array}$ $\begin{array}{l} \frac{\partial^{2} z}{\partial x^{2}} = 2 A^{T} A . \end{array}$

si15_e (3.8)

From Eq. (3.7), x is given by

$\begin{array}{l} x = {(A^{T} A)}^{- 1} A^{T} y . \end{array}$ $\begin{array}{l} x = {(A^{T} A)}^{- 1} A^{T} y . \end{array}$

(3.9)

By virtue of A having full rank, (A^TA) is a positive definite matrix. Thus (A^TA)⁻¹ exists and the Hessian matrix is positive definite, implying that a minimum was obtained. In this case (n > m) the generalized inverse of A is given by

$\begin{array}{l} A^{+} = {(A^{T} A)}^{- 1} A^{T} . \end{array}$ $\begin{array}{l} A^{+} = {(A^{T} A)}^{- 1} A^{T} . \end{array}$

(3.10)

It is interesting to note that if y is contained in the subspace spanned by the columns of A, Eq. (3.9) yields an exact solution to Eq. (3.1), i.e., ∥e∥ = 0. The optimal feature of Eq. (3.9) has found extensive applications in data processing for least square approximations. In closing it should be noted that Eq. (3.9) can be obtained by invoking the Orthogonal Projection Lemma, thus providing a geometric interpretation to the optimal feature of Eq. (3.10).

(2) Case B: n < m

Again, without loss of generality, it can be assumed that A is of full rank, i.e.,

$\begin{array}{l} rank (A) = n . \end{array}$ $\begin{array}{l} rank (A) = n . \end{array}$

(3.11)

If, however, rank(A) < n, it implies that some of the equations are merely a linear combination of the others and therefore may be deleted without loss of information, thereby reducing the case rank(A) < n to the case rank(A) = n. Moreover, if A is a square singular matrix, it can be reduced to case B after proper deletion of the dependent rows of A.

Eqs. (3.1), (3.11) with n < m yield an infinite number of solutions, the “optimal” of which is the one having the smallest norm. Therefore one is confronted with a constrained minimization problem, where the minimization of ∥x∥ (or equivalently $\frac{∥ x ∥^{2}}{2}$ $\frac{∥ x ∥^{2}}{2}$ ) is to be accomplished subject to Eq. (3.1). Adjoining the constraint, via a vector of Lagrange multipliers (λ), the objective function to be minimized is

$\begin{array}{l} H = \frac{1}{2} ∥ x ∥^{2} + λ^{T} (y - A x) = \frac{1}{2} x^{T} x + λ^{T} (y - A x) . \end{array}$ $\begin{array}{l} H = \frac{1}{2} ∥ x ∥^{2} + λ^{T} (y - A x) = \frac{1}{2} x^{T} x + λ^{T} (y - A x) . \end{array}$

(3.12)

By evaluating the respective gradients

$\begin{array}{l} \frac{\partial H}{\partial x} & = x - A^{T} λ = 0, \end{array}$ $\begin{array}{l} \frac{\partial H}{\partial x} & = x - A^{T} λ = 0, \end{array}$

si21_e (3.13)

$\begin{array}{l} \frac{\partial H}{\partial λ} & = y - A x = 0 . \end{array}$ $\begin{array}{l} \frac{\partial H}{\partial λ} & = y - A x = 0 . \end{array}$

(3.14)

From Eq. (3.13), one has

$\begin{array}{l} x = A^{T} λ . \end{array}$ $\begin{array}{l} x = A^{T} λ . \end{array}$

(3.15)

Substitution of Eq. (3.15) into Eq. (3.14) yields

$\begin{array}{l} y = A A^{T} λ . \end{array}$ $\begin{array}{l} y = A A^{T} λ . \end{array}$

(3.16)

That is

$\begin{array}{l} λ = {(A A^{T})}^{- 1} y . \end{array}$ $\begin{array}{l} λ = {(A A^{T})}^{- 1} y . \end{array}$

(3.17)

The existence of (AA^T)⁻¹ is guaranteed by virtue of Eq. (3.11). Substitution of Eq. (3.17) into Eq. (3.15) yields

$\begin{array}{l} x = A^{T} {(A A^{T})}^{- 1} y . \end{array}$ $\begin{array}{l} x = A^{T} {(A A^{T})}^{- 1} y . \end{array}$

(3.18)

In this case (n < m) the generalized inverse of A is given by

$\begin{array}{l} A^{+} = A^{T} {(A A^{T})}^{- 1} . \end{array}$ $\begin{array}{l} A^{+} = A^{T} {(A A^{T})}^{- 1} . \end{array}$

(3.19)

For the sake of completeness it should be noted that the norm minimization of ∥e∥ and ∥x∥ of Eqs. (3.6), (3.12), respectively, can be performed by the extended vector norms, where a norm of a vector w is defined as w^TQw and Q is a compatible positive definite weighting matrix. By doing so, one can choose the relative emphasis of the magnitudes of the vector components that is minimized.

For case A, Eq. (3.6) becomes

$\begin{array}{l} z = e^{T} Q e = {(y - A x)}^{T} Q (y - A x) \end{array}$ $\begin{array}{l} z = e^{T} Q e = {(y - A x)}^{T} Q (y - A x) \end{array}$

(3.20)

with the solution

$\begin{array}{l} x = {(A^{T} Q A)}^{- 1} A^{T} Q y . \end{array}$ $\begin{array}{l} x = {(A^{T} Q A)}^{- 1} A^{T} Q y . \end{array}$

(3.21)

From case B, Eq. (3.12) becomes

$\begin{array}{l} H = \frac{1}{2} x^{T} Q x + λ^{T} (y - A x) \end{array}$ $\begin{array}{l} H = \frac{1}{2} x^{T} Q x + λ^{T} (y - A x) \end{array}$

(3.22)

with the solution

$\begin{array}{l} x = Q^{- 1} A^{T} {(A Q^{- 1} A^{T})}^{- 1} . \end{array}$ $\begin{array}{l} x = Q^{- 1} A^{T} {(A Q^{- 1} A^{T})}^{- 1} . \end{array}$

(3.23)

In summary, a unified framework has been presented showing that the generalized inverse of a matrix can be viewed as a result of a minimization problem leading to a practical interpretation.

3.2 The full rank decomposition theorem

Concerning the situations of whether a matrix being full row rank or full column rank, we have the following obvious statement.

Theorem 3.2.1

When A ∈ C^m×n, if

(1) rank A = m, A is of full row rank,

(2) rank A = n, A is of full column rank.

It is obvious that if A is of full row rank, then m ≤ n and if A is of full column rank, then m ≥ n. A sufficient and necessary condition for a matrix to be of full rank is that it is both of full row rank and of full column rank.

Theorem 3.2.2

Assuming A ∈ C^m×n, if there is a full column rank matrix F and full row rank matrix G making that

$\begin{array}{l} A = F G, \end{array}$ $\begin{array}{l} A = F G, \end{array}$

then A has a full rank decomposition.

Theorem 3.2.3

When A ∈ C^m×n, rank A = r > 0, there must be a full rank factorization for it.

Proof

For the matrix A, if rank A = r > 0, one is able to choose r linear independent columns from the columns of A: $A_{i_{1}}, A_{i_{2}}, \dots, A_{i_{r}}$ $A_{i_{1}}, A_{i_{2}}, \dots, A_{i_{r}}$ . The remaining columns can been expressed by these columns. That is, we have

$\begin{array}{l} F = [A_{i_{1}} A_{i_{2}} \dots A_{i_{r}}] . \end{array}$ $\begin{array}{l} F = [A_{i_{1}} A_{i_{2}} \dots A_{i_{r}}] . \end{array}$

It is seen that, the matrix F is a full column rank matrix. Therefore, there exists permutation matrix Q and S such that

$\begin{array}{l} A Q = [A_{i_{1}} \dots A_{i_{2}} \tilde{A_{1}} \dots \tilde{A_{n - 1}}] = [F F S] = F [I_{r} S] . \end{array}$ $\begin{array}{l} A Q = [A_{i_{1}} \dots A_{i_{2}} \tilde{A_{1}} \dots \tilde{A_{n - 1}}] = [F F S] = F [I_{r} S] . \end{array}$

Letting G = [I_rS]Q⁻¹, G is a full row rank matrix. So A has a full rank decomposition

$\begin{array}{l} A = F G . \end{array}$ $\begin{array}{l} A = F G . \end{array}$

Theorem 3.2.4

Some important properties of full column (row) rank matrix: if F ∈ C^m×r is a full column rank matrix, then one has

• the eigenvalue of F^HF is larger than zero;

• F^HF is positive definite Hermite matrix;

• F^HF is an r order invertible matrix.

Proof

If λ is any eigenvalue of F^HF, the vector x is the corresponding nonzero feature vector, one has

$\begin{array}{l} F^{H} F x = λ x . \end{array}$ $\begin{array}{l} F^{H} F x = λ x . \end{array}$

In this case, we have

$\begin{array}{l} λ ∥ x ∥_{2}^{2} = λ x^{H} x = x^{H} (F^{H} F x) = {(F x)}^{H} (F x) = ∥ F x ∥_{2}^{2} \geq 0 . \end{array}$ $\begin{array}{l} λ ∥ x ∥_{2}^{2} = λ x^{H} x = x^{H} (F^{H} F x) = {(F x)}^{H} (F x) = ∥ F x ∥_{2}^{2} \geq 0 . \end{array}$

Considering F is a full column rank matrix, and Fx≠0, so $∥ F x ∥_{2}^{2} > 0$ $∥ F x ∥_{2}^{2} > 0$ . We therefore draw the conclusion: λ > 0.

Analogously, if G ∈ C^r×n is a full row rank matrix, then GG^H has the above properties. When A ∈ C^m×n, B ∈ C^n×m, if AB = I_m, then B is the right inverse of A denoted by B = A_right⁻¹ and A is the left inverse of B denoted by A = B_left⁻¹.

3.3 The least square solution to an algebraic matrix equation

The method of least squares is a standard approach in regression analysis to the approximate solution of the over determined systems, in which among the set of equations there are more equations than unknowns. The term “least squares” refers to this situation, the overall solution minimizes the summation of the squares of the errors, which are brought by the results of every single equation.

3.3.1 The solution to the compatible linear equations

We have the following general solution of a given linear matrix equation, which is formulated in terms of the Moore-Penrose inverse.

Theorem 3.3.1

If A ∈ C^m×n, b ∈ C^m, x ∈ Cⁿ, and if the system of linear equations Ax = b is compatible, the general solution is given by

$\begin{array}{l} x = A^{+} b + (I_{n} - A^{+} A) t (\forall t \in C^{n}) . \end{array}$ $\begin{array}{l} x = A^{+} b + (I_{n} - A^{+} A) t (\forall t \in C^{n}) . \end{array}$

Proof

Because Ax = b is compatible, there is a x₀ making Ax₀ = b. So we have on one hand

$\begin{array}{l} \begin{matrix} A x & = A A^{+} b + A (I_{n} - A^{+} A) t \\ = A A^{+} A x_{0} + (A - A A^{+} A) t \\ = A x_{0} + O t \\ = b . \end{matrix} \end{array}$ $\begin{array}{l} \begin{matrix} A x & = A A^{+} b + A (I_{n} - A^{+} A) t \\ = A A^{+} A x_{0} + (A - A A^{+} A) t \\ = A x_{0} + O t \\ = b . \end{matrix} \end{array}$

si40_e

On the other hand, if x₀ is any solution of Ax = b, such that t = x₀. Therefore,

$\begin{array}{l} \begin{matrix} A^{+} b + (I_{n} - A^{+} A) t & = A^{+} b + (I_{n} - A^{+} A) x_{0} \\ = A^{+} b + x_{0} - A^{+} A x_{0} \\ = A^{+} b + x_{0} - A^{+} b \\ = x_{0} . \end{matrix} \end{array}$ $\begin{array}{l} \begin{matrix} A^{+} b + (I_{n} - A^{+} A) t & = A^{+} b + (I_{n} - A^{+} A) x_{0} \\ = A^{+} b + x_{0} - A^{+} A x_{0} \\ = A^{+} b + x_{0} - A^{+} b \\ = x_{0} . \end{matrix} \end{array}$

si41_e

So far, we have proved that

$\begin{array}{l} x = A^{+} b + (I_{n} - A^{+} A) t (\forall t \in C^{n}) \end{array}$ $\begin{array}{l} x = A^{+} b + (I_{n} - A^{+} A) t (\forall t \in C^{n}) \end{array}$

is the general solution of the equation Ax = b, which concludes the proof.

In particular, when b = 0, we obtain the general solution of the homogeneous linear equations Ax = 0 given by

$\begin{array}{l} x = (I_{n} - A^{+} A) t (\forall t \in C^{n}) . \end{array}$ $\begin{array}{l} x = (I_{n} - A^{+} A) t (\forall t \in C^{n}) . \end{array}$

3.3.2 The least square solution of incompatible equation

Considering in many applications, such as data processing, multivariate analysis, optimization theory, modern control theory, and networking theory, the mathematical model of linear matrix equation is often an incompatible one. We hereafter use the generalized inverse matrix to represent the following general solution of incompatible linear equations.

Theorem 3.3.2

When A ∈ C^m×n, b ∈ C^m, if there is x*∈ Cⁿ such that

$\begin{array}{l} ∥ A x^{*} - b ∥_{2} = min_{x \in C^{n}} {∥ A x - b ∥_{2}}, \end{array}$ $\begin{array}{l} ∥ A x^{*} - b ∥_{2} = min_{x \in C^{n}} {∥ A x - b ∥_{2}}, \end{array}$

then the component x* is termed as the least square solutions to the system of equations Ax = b.

Theorem 3.3.3

When A ∈ C^m×n, all the least square solutions of system of equation Ax = b are given by

$\begin{array}{l} x^{*} = A^{+} b + (I_{n} - A^{+} A) t (t \in C^{n}) . \end{array}$ $\begin{array}{l} x^{*} = A^{+} b + (I_{n} - A^{+} A) t (t \in C^{n}) . \end{array}$

Proof

∀x ∈ Cⁿ, we define

$\begin{array}{l} φ (x) = ∥ A x - b ∥_{2}^{2} = x^{H} A^{H} A x - 2 x^{H} A^{H} b + b^{H} b \end{array}$ $\begin{array}{l} φ (x) = ∥ A x - b ∥_{2}^{2} = x^{H} A^{H} A x - 2 x^{H} A^{H} b + b^{H} b \end{array}$

and denote x = μ + iν (μ, ν ∈ Rⁿ), we then have

$\begin{array}{l} \frac{\partial φ}{\partial x} ≜ \frac{\partial φ}{\partial μ} + i \frac{\partial φ}{\partial ν} = 2 A^{H} A x - 2 A^{H} b = 0 . \end{array}$ $\begin{array}{l} \frac{\partial φ}{\partial x} ≜ \frac{\partial φ}{\partial μ} + i \frac{\partial φ}{\partial ν} = 2 A^{H} A x - 2 A^{H} b = 0 . \end{array}$

si47_e

Therefore we have A^HAx = A^Hb. Because

$\begin{array}{l} rank A^{H} A \leq rank [A^{H} A ⋮ A^{H} b] = rank A^{H} [A ⋮ b] \leq rank A = rank A^{H} A, \end{array}$ $\begin{array}{l} rank A^{H} A \leq rank [A^{H} A ⋮ A^{H} b] = rank A^{H} [A ⋮ b] \leq rank A = rank A^{H} A, \end{array}$

due to the rank of the augmented matrix is equal to that of the coefficient matrix, so A^HAx = A^Hb is compatibility equations. Assuming φ(x) is nonnegative real function, according to multivariate function extreme value theory, the solution of normal equations is the minimum point of φ(x). Subsequently, the total least square solution of equations Ax = b is

$\begin{array}{l} \begin{matrix} x^{*} & = {(A^{H} A)}^{+} A^{H} b + (I_{n} - {(A^{H} A)}^{+} A^{H} A) t \\ = A^{+} b + (I_{n} - A^{+} A) t (\forall t \in C^{n}) . \end{matrix} \end{array}$ $\begin{array}{l} \begin{matrix} x^{*} & = {(A^{H} A)}^{+} A^{H} b + (I_{n} - {(A^{H} A)}^{+} A^{H} A) t \\ = A^{+} b + (I_{n} - A^{+} A) t (\forall t \in C^{n}) . \end{matrix} \end{array}$

si49_e

3.3.3 The minimum norm least squares solution for the equations

From the above discussions, whether the equations are compatible or not, the least-square solutions can be expressed as

$\begin{array}{l} x^{*} = A^{+} b + (I_{n} - A^{+} A) t (t \in C^{n}) . \end{array}$ $\begin{array}{l} x^{*} = A^{+} b + (I_{n} - A^{+} A) t (t \in C^{n}) . \end{array}$

Because the least squares solution is not unique, one is able to find a least square solution of minimum norm, which is given by

$\begin{array}{l} ∥ x^{* *} ∥_{2} = min {∥ x^{*} ∥_{2} : x^{*} = A^{+} b + (I_{n} - A^{+} A) t, t \in C^{n}}, \end{array}$ $\begin{array}{l} ∥ x^{* *} ∥_{2} = min {∥ x^{*} ∥_{2} : x^{*} = A^{+} b + (I_{n} - A^{+} A) t, t \in C^{n}}, \end{array}$

where x** is referred to as the very minimal norm least square solution to the equations Ax = b. We have the following theorem:

Theorem 3.3.4

x** = A⁺b is the minimal norm least squares solution for the equations

Proof

By the nature of the A⁺, we have

$\begin{array}{l} \begin{matrix} 〈 A^{+} b, (I_{n} - A^{+} A) t 〉 & = b^{H} {(A^{+})}^{H} (I_{n} - A^{+} A) t \\ = b^{H} [{(A^{+})}^{H} - {(A^{+})}^{H} A^{+} A] t \\ = b^{H} [{(A^{+})}^{H} - {(A^{+})}^{H} {(A^{+} A)}^{H}] t \\ = b^{H} [{(A^{+})}^{H} - {(A^{+} A A^{+})}^{H}] t \\ = b^{H} [{(A^{+})}^{H} - {(A^{+})}^{H}] t \\ = 0 . \end{matrix} \end{array}$ $\begin{array}{l} \begin{matrix} 〈 A^{+} b, (I_{n} - A^{+} A) t 〉 & = b^{H} {(A^{+})}^{H} (I_{n} - A^{+} A) t \\ = b^{H} [{(A^{+})}^{H} - {(A^{+})}^{H} A^{+} A] t \\ = b^{H} [{(A^{+})}^{H} - {(A^{+})}^{H} {(A^{+} A)}^{H}] t \\ = b^{H} [{(A^{+})}^{H} - {(A^{+} A A^{+})}^{H}] t \\ = b^{H} [{(A^{+})}^{H} - {(A^{+})}^{H}] t \\ = 0 . \end{matrix} \end{array}$

si52_e

The least-square solutions of equations Ax = b is composed of two orthogonal vectors, so

$\begin{array}{l} ∥ x^{*} ∥_{2}^{2} = ∥ A^{+} b ∥_{2}^{2} + ∥ (I_{n} - A^{+} A) t ∥_{2}^{2} \geq ∥ A^{+} b ∥_{2}^{2} (t \in C^{n}) . \end{array}$ $\begin{array}{l} ∥ x^{*} ∥_{2}^{2} = ∥ A^{+} b ∥_{2}^{2} + ∥ (I_{n} - A^{+} A) t ∥_{2}^{2} \geq ∥ A^{+} b ∥_{2}^{2} (t \in C^{n}) . \end{array}$

This suggests that x** = A⁺b is the minimal norm least square solution for the equation Ax = b.

3.4 The singular value decomposition

The Moore-Penrose inverse can be obtained through singular value decomposition (SVD): A = UDV^T, such that A⁺ = V D⁻¹U^T. We have the following definition

Definition 3.4.1

When A ∈ C^m×n, denote the following positive characteristic values for Hermite matrix A^HA

$\begin{array}{l} λ_{1}, λ_{2}, \dots, λ_{n} . \end{array}$ $\begin{array}{l} λ_{1}, λ_{2}, \dots, λ_{n} . \end{array}$

Then

$\begin{array}{l} σ_{1} = \sqrt{λ_{1}}, σ_{2} = \sqrt{λ_{2}}, \dots, σ_{n} = \sqrt{λ_{n}} \end{array}$ $\begin{array}{l} σ_{1} = \sqrt{λ_{1}}, σ_{2} = \sqrt{λ_{2}}, \dots, σ_{n} = \sqrt{λ_{n}} \end{array}$

are the single values of A.

If A ∈ C^m×n, rank A = r, then A^HA has r positive eigenvalues and the remaining eigenvalues are zeros. Further, we assume that they can be ordered in the following manner

$\begin{array}{l} σ_{1} \geq σ_{2} \geq \dots σ_{r} > σ_{r + 1} = σ_{r + 2} = \dots = σ_{n} = 0 . \end{array}$ $\begin{array}{l} σ_{1} \geq σ_{2} \geq \dots σ_{r} > σ_{r + 1} = σ_{r + 2} = \dots = σ_{n} = 0 . \end{array}$

Theorem 3.4.1

When A ∈ C^m×n, rank A = r > 0, the so-called singular value decomposition (SVD) of A is

$\begin{array}{l} U^{H} A V = [\begin{array}{l} Σ & 0 \\ 0 & 0 \end{array}], \end{array}$ $\begin{array}{l} U^{H} A V = [\begin{array}{l} Σ & 0 \\ 0 & 0 \end{array}], \end{array}$

si57_e

where the matrix U is the m order unitary matrix, the matrix V is the n order unitary matrix, and the matrix Σ is the following diagonal matrix

$\begin{array}{l} Σ = diag (σ_{1}, σ_{2}, \dots, σ_{r}), σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0 . \end{array}$ $\begin{array}{l} Σ = diag (σ_{1}, σ_{2}, \dots, σ_{r}), σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0 . \end{array}$

Note in the above the component r is the positive singular value of A.

Proof

If $σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{r}^{2}, 0, \dots, 0$ $σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{r}^{2}, 0, \dots, 0$ are all the eigenvalues of A, and σ₁ ≥ σ₂ ≥⋯ ≥ σ_r > 0. Because A^HA is nonnegative definite Hermite matrix, there is a unitary matrix V with the order n, satisfying the following associations

$\begin{array}{l} U^{H} (A^{H} A) V = [\begin{array}{l} σ_{1}^{2} \\ ⋱ \\ σ_{r}^{2} \\ 0 \\ ⋱ \\ 0 \end{array}] ≜ [\begin{array}{l} Σ^{2} & 0 \\ 0 & 0 \end{array}] . \end{array}$ $\begin{array}{l} U^{H} (A^{H} A) V = [\begin{array}{l} σ_{1}^{2} \\ ⋱ \\ σ_{r}^{2} \\ 0 \\ ⋱ \\ 0 \end{array}] ≜ [\begin{array}{l} Σ^{2} & 0 \\ 0 & 0 \end{array}] . \end{array}$

si60_e

Among them $\sum = diag (σ_{1}, σ_{2}, \dots, σ_{r})$ $\sum = diag (σ_{1}, σ_{2}, \dots, σ_{r})$ . By partitioning the matrix V into some subblocks

$\begin{array}{l} V = [V_{1}, V_{2}], V_{1} \in C^{n \times r}, V_{2} \in C^{n \times (n \times r)} \end{array}$ $\begin{array}{l} V = [V_{1}, V_{2}], V_{1} \in C^{n \times r}, V_{2} \in C^{n \times (n \times r)} \end{array}$

then we will have

$\begin{array}{l} \begin{matrix} [\begin{array}{l} V_{1}^{H} \\ V_{2}^{H} \end{array}] (A^{H} A) [V_{1} V_{2}] & = [\begin{array}{l} V_{1}^{H} A^{H} A V_{1} & V_{1}^{H} A^{H} A V_{2} \\ V_{2}^{H} A^{H} A V_{1} & V_{2}^{H} A^{H} A V_{2} \end{array}] \\ = [\begin{array}{l} Σ^{2} & 0 \\ 0 & 0 \end{array}] . \end{matrix} \end{array}$ $\begin{array}{l} \begin{matrix} [\begin{array}{l} V_{1}^{H} \\ V_{2}^{H} \end{array}] (A^{H} A) [V_{1} V_{2}] & = [\begin{array}{l} V_{1}^{H} A^{H} A V_{1} & V_{1}^{H} A^{H} A V_{2} \\ V_{2}^{H} A^{H} A V_{1} & V_{2}^{H} A^{H} A V_{2} \end{array}] \\ = [\begin{array}{l} Σ^{2} & 0 \\ 0 & 0 \end{array}] . \end{matrix} \end{array}$

si63_e

Then

$\begin{array}{l} V_{1}^{H} A^{H} A V_{1} = Σ^{2}, V_{1}^{H} A^{H} A V_{2} = O, V_{2}^{H} A^{H} A V_{1} = O, V_{2}^{H} A^{H} A V_{2} = O . \end{array}$ $\begin{array}{l} V_{1}^{H} A^{H} A V_{1} = Σ^{2}, V_{1}^{H} A^{H} A V_{2} = O, V_{2}^{H} A^{H} A V_{1} = O, V_{2}^{H} A^{H} A V_{2} = O . \end{array}$

$\begin{array}{l} {(A V_{1} Σ^{- 1})}^{H} (A V_{1} Σ^{- 1}) = I_{r}, {(A V_{2})}^{H} (A V_{2}) = O, A V_{2} = O . \end{array}$ $\begin{array}{l} {(A V_{1} Σ^{- 1})}^{H} (A V_{1} Σ^{- 1}) = I_{r}, {(A V_{2})}^{H} (A V_{2}) = O, A V_{2} = O . \end{array}$

By letting

$\begin{array}{l} U_{1} = A V_{1} Σ^{- 1}, \end{array}$ $\begin{array}{l} U_{1} = A V_{1} Σ^{- 1}, \end{array}$

where U₁ is part of the unitary matrix and extending it into an m order unitary matrix U = [U₁U₂], U₂ is part of the column unitary matrix, we thus have

$\begin{array}{l} \begin{matrix} U^{H} A V & = [\begin{array}{l} U_{1}^{H} \\ U_{2}^{H} \end{array}] A [V_{1} V_{2}] \\ = [\begin{array}{l} U_{1}^{H} A V_{1} & U_{1}^{H} A V_{2} \\ U_{2}^{H} A V_{1} & U_{2}^{H} A V_{2} \end{array}] \\ = [\begin{array}{l} U_{1}^{H} (U_{1} \sum) & O \\ U_{2}^{H} (U_{1} \sum) & O \end{array}] \\ = [\begin{array}{l} \sum & 0 \\ 0 & 0 \end{array}], \end{matrix} \end{array}$ $\begin{array}{l} \begin{matrix} U^{H} A V & = [\begin{array}{l} U_{1}^{H} \\ U_{2}^{H} \end{array}] A [V_{1} V_{2}] \\ = [\begin{array}{l} U_{1}^{H} A V_{1} & U_{1}^{H} A V_{2} \\ U_{2}^{H} A V_{1} & U_{2}^{H} A V_{2} \end{array}] \\ = [\begin{array}{l} U_{1}^{H} (U_{1} \sum) & O \\ U_{2}^{H} (U_{1} \sum) & O \end{array}] \\ = [\begin{array}{l} \sum & 0 \\ 0 & 0 \end{array}], \end{matrix} \end{array}$

si67_e

which concludes the proof.

Singular value decomposition is a factorization of a rectangular matrix A into three matrices U, D, and V. The two matrices U and V are orthogonal matrices (U^T = U⁻¹, V V^T = I) while D is a diagonal matrix. The factorization means that we can multiply the three matrices to get back the original matrix A = UDV^T. The transpose matrix is obtained through A^T = V DU^T. Since both orthogonal matrix and diagonal matrix have many nice properties, SVD is one of the most powerful matrix decomposition and is used in many applications, such as the least square solution (regression), feature selection (PCA, MDS), spectral clustering, image restoration and three-dimensional computer vision (fundamental matrix estimation), equilibrium of Markov Chain, and many others.

Matrix U and V are not unique, their columns come from the concatenation of eigenvectors of symmetric matrices AA^T and A^TA. Since eigenvectors of symmetric matrix are orthogonal (and linearly independent), they can be used as basis vectors (coordinate system) to span a multidimensional space. The absolute value of the determinant of orthogonal matrix is one, thus the matrix always has inverse. Furthermore, each column (and each row) of orthogonal matrix has unit norm. The diagonal matrix D contains the square of eigenvalues of symmetric matrix A^TA. The diagonal elements are nonnegative numbers and they are called singular values. Because they come from a symmetric matrix, the eigenvalues (and the eigenvectors) are all real numbers (no complex numbers).

Numerical computation of SVD is stable in terms of round off error. When some of the singular values are nearly zero, we can truncate them as zero and it yields numerical stability. If the SVD factor matrix A = UDV^T, then the diagonal matrix can also be obtained from D = U^TAV. The eigenvectors represent many solutions of the homogeneous equation. They are not unique and correct up to a scalar multiple.

Properties: SVD can reveal many things:

1. Singular value gives valuable information as to whether a square matrix A is singular. A square matrix A is nonsingular (i.e., have inverse) if and only if all its singular values are different from zero.

2. If the square matrix A is nonsingular, the inverse matrix can be obtained by

$\begin{array}{l} A^{- 1} = V D^{- 1} U^{T} . \end{array}$ $\begin{array}{l} A^{- 1} = V D^{- 1} U^{T} . \end{array}$

3. The number of nonzero singular values is equal to the rank of any rectangular matrix. In fact, SVD is a robust technique to compute matrix rank against ill-conditioned matrices.

4. The ratio between the largest and the smallest singular value is called the condition number, which measures the degree of singularity and reveals ill-conditioned matrix.

5. SVD can produce one of matrix norms, which is called the Frobenius norm, by taking the sum of square of singular values

$\begin{array}{l} ∥ A ∥_{F} = \sum_{i} σ_{i}^{2} = \sum_{i} \sum_{j} a_{i j}^{2} . \end{array}$ $\begin{array}{l} ∥ A ∥_{F} = \sum_{i} σ_{i}^{2} = \sum_{i} \sum_{j} a_{i j}^{2} . \end{array}$

si69_e

The Frobenius norm is computed by taking the sums of the square elements in the matrix.

6. SVD can also produce a generalized inverse (pseudo-inverse) for any rectangular matrix. In fact, the generalized inverse is also a Moore-Penrose inverse by setting

$\begin{array}{l} A^{+} = V D_{0}^{- 1} U^{T}, \end{array}$ $\begin{array}{l} A^{+} = V D_{0}^{- 1} U^{T}, \end{array}$

where the matrix $D_{0}^{- 1}$ $D_{0}^{- 1}$ is equal to D⁻¹ but all nearly zero values are set to zero.

7. SVD also approximates the solution of the nonhomogenous linear system AX = b such that the norm is minimum $min ∥ A - b x ∥$ $min ∥ A - b x ∥$ . This is the basic of least square, orthogonal projection and regression analysis.

8. SVD also solves the homogenous linear system by taking the column of V^T, which represents the eigenvector corresponding to the zero eigenvalue of symmetric matrix A^TA.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3: Generalized inverse of matrix and solution of linear system equation

Create new playlist

Sign In

Sign Up

3.1 The generalized inverse of matrix

3.1.1 The left inverse and right inverse

3.1.2 Moore-Penrose inverse

3.1.3 The minimization approach to solve an algebraic matrix equation

3.2 The full rank decomposition theorem

3.3 The least square solution to an algebraic matrix equation

3.3.1 The solution to the compatible linear equations

3.3.2 The least square solution of incompatible equation

3.3.3 The minimum norm least squares solution for the equations

3.4 The singular value decomposition

Table of Contents for
3: Generalized inverse of matrix and solution of linear system equation