3

Generalized inverse of matrix and solution of linear system equation

Abstract

This chapter presents a brief introduction to the generalized inverse of matrix, which is needed in the following expositions. This introduction includes the left inverse and right inverse, the Moore-Penrose inverse, the minimization approach to solve an algebraic matrix equation, the full rank decomposition theorem, the least square solution to an algebraic matrix equation, and the singular value decomposition.

Keywords

Generalized inverse of matrix; Left inverse; Right inverse; Moore-Penrose inverse; Full rank decomposition theorem; Least square solution to an algebraic matrix equation; Singular value decomposition

In this chapter, we briefly present some background for generalized inverse of matrix and its relation to solution of linear system equation, that is needed in the sequel. This chapter can be skimmed very quickly and used mainly as a quick reference. There have been a huge number of reference resources for this topic, to mention a few [43, 44, 5154].

3.1 The generalized inverse of matrix

Recall how we defined the matrix inverse. A matrix inverse A−1 is defined as a matrix that produces identity matrix when we multiply with the original matrix A, that is, we define AA−1 = I = A−1A. Matrix inverse exists only for square matrices.

Real world data are not always square. Furthermore, real world data are not always consistent and might contain many repetitions. To deal with real world data, generalized inverse for rectangular matrix is needed.

Generalized inverse matrix is defined as AAA = A. Notice that the usual matrix inverse is covered by this definition because AA−1A = A. We use the term “generalized” inverse for a general rectangular matrix and to distinguish it from the inverse matrix that is for a square matrix. Generalized inverse is also called the pseudo-inverse.

Unfortunately there are many types of generalized inverse. Most generalized inverses are not unique. Some generalized inverses are reflexive satisfying (A) = A and some are not reflexive. In this tutorial, we will only discuss a few of them that are often used in practical applications.

 A reflexive generalized inverse is defined as

AAA=A,AAA=A.

si1_e

 A minimum norm generalized inverse, such that x = Ab will minimize ∥x∥ is defined as

AAA=A,AAA=A,(AA)=AA.

si2_e

 A generalized inverse that produces the least square solution that will minimize the residual or error minxbAxsi3_e, is defined as

AAA=A,(AA)T=AA.

si4_e

3.1.1 The left inverse and right inverse

The usual matrix inverse is defined as a two-side inverse, i.e., AA−1 = I = A−1A because we can multiply the inverse matrix from the left or from the right of matrix A and we still get the identity matrix. This property is only true for a square matrix A.

For a rectangular matrix A, we may have a generalized left inverse or left inverse for short when we multiply the inverse from the left to get identity matrix Aleft−1A = I. Similarly, we may have a generalized right inverse, or right inverse for short, when we multiply the inverse from the right to get the identity matrix AAright = I.

In general, the left inverse is not equal to the right inverse. The generalized inverse of a rectangular matrix is related to the solving of system linear equations Ax = b. The solution to a normal equation is x = (ATA)−1ATb, which is equal to x = Ab. The term

A=Aleft1=(ATA)1AT

si5_e

is often called as generalized left inverse. Yet another pseudo-inverse can also be obtained by multiplying the transpose matrix from the right and this is called a generalized right inverse

A=Aright1=AT(AAT)1.

si6_e

3.1.2 Moore-Penrose inverse

It is possible to obtain a unique generalized matrix. To distinguish the unique generalized inverse from other nonunique generalized inverses A, we use the symbol A+. The unique generalized inverse is called the Moore-Penrose inverse. It is defined using the following four conditions:

(1) AA+A = A,

(2) A+AA+ = A+,

(3) (AA+)T = AA+,

(4) (A+A)T = A+A.

The first condition AA+A = A is the definition of a generalized inverse. Together with the first condition, the second condition indicates the generalized inverse is reflexive (A) = A. Together with the first condition, the third condition indicates that the generalized inverse is the least square solution that will minimize the norm of error minxbAx.si7_e The fourth condition above demonstrates the unique generalized inverse.

Properties of generalized inverse of matrix: Some important properties of generalized inverse of matrix are:

1. The transpose of the left inverse of A is the right inverse Aright−1 = (Aleft−1)T. Similarly, the transpose of the right inverse of A is the left inverse Aleft−1 = (Aright−1)T.

2. A matrix Am×n has a left inverse Aleft−1 if and only if its rank equals its number of columns and the number of rows is more than the number of columns ρ(A) = n < m. In this case A+A = Aleft−1A = I.

3. A matrix Am×n has a right inverse Aright−1 if and only if its rank equals its number of rows and the number of rows is less than the number of columns ρ(A) = m < n. In this case A+A = AAright−1 = I.

4. The Moore-Penrose inverse is equal to left inverse A+ = Aleft−1, when ρ(A) = n < m and equals the right inverse A+ = Aright−1, when ρ(A) = m < n. The Moore-Penrose inverse is equal to the matrix inverse A+ = A−1, when ρ(A) = m = n.

3.1.3 The minimization approach to solve an algebraic matrix equation

The generalized inverse of the matrix has been used extensively in the areas of modern control, least square estimation and aircraft structural analysis. It is the purpose of this note to extend the results by presenting a unified framework that provides geometric insight and highlights certain optimal features imbedded in the generalized inverse.

Consider the algebraic matrix equation

y=Ax,

si8_e  (3.1)

where A is an n × m constant matrix, y is a given n vector, and x is an m vector to be determined. For the trivial case where n = m and A is a nonsingular matrix, i.e., rank(A) = n, a unique solution to Eq. (3.1) exists and is given by

x=A1y,

si9_e  (3.2)

where A−1 denotes the inverse of A.

For the case nm, the expression of x in terms of y involves the generalized inverse of A, denoted A+, and thus

x=A+y.

si10_e  (3.3)

In the following cases it will be shown that A+, for either n > m or n < m, may be viewed as a solution to a certain minimization problem.

(1) Case A: n > m

With no loss of generality it can be assumed that A is of full rank, i.e.,

rank(A)=m

si11_e  (3.4)

however, rank (A) < n. It is possible to delete the dependent columns of A, set the respective unknown components of x equal to zero, and reduce the problem to the case in which Eq. (3.4) is satisfied. Since the m columns of A do not span the n dimensional space Rn, an exact solution to Eq. (3.1) cannot be obtained if y is not contained in the subspace spanned by the columns of A. Thus, one is motivated to seek approximate solutions, the best of which is the one that minimizes the Euclidian norm of the error. Let the error e be given by

e=yAx.

si12_e  (3.5)

Then let z be given by

z=e2=eTe=(yAx)T(yAx),

si13_e  (3.6)

where the superscript “T” denotes the transpose. Evaluation of the gradient of z with respect to x yields

zx=2ATy+2ATAx=0.

si14_e  (3.7)

The Hessian matrix is

2zx2=2ATA.

si15_e  (3.8)

From Eq. (3.7), x is given by

x=(ATA)1ATy.

si16_e  (3.9)

By virtue of A having full rank, (ATA) is a positive definite matrix. Thus (ATA)−1 exists and the Hessian matrix is positive definite, implying that a minimum was obtained. In this case (n > m) the generalized inverse of A is given by

A+=(ATA)1AT.

si17_e  (3.10)

It is interesting to note that if y is contained in the subspace spanned by the columns of A, Eq. (3.9) yields an exact solution to Eq. (3.1), i.e., ∥e∥ = 0. The optimal feature of Eq. (3.9) has found extensive applications in data processing for least square approximations. In closing it should be noted that Eq. (3.9) can be obtained by invoking the Orthogonal Projection Lemma, thus providing a geometric interpretation to the optimal feature of Eq. (3.10).

(2) Case B: n < m

Again, without loss of generality, it can be assumed that A is of full rank, i.e.,

rank(A)=n.

si18_e  (3.11)

If, however, rank(A) < n, it implies that some of the equations are merely a linear combination of the others and therefore may be deleted without loss of information, thereby reducing the case rank(A) < n to the case rank(A) = n. Moreover, if A is a square singular matrix, it can be reduced to case B after proper deletion of the dependent rows of A.

Eqs. (3.1), (3.11) with n < m yield an infinite number of solutions, the “optimal” of which is the one having the smallest norm. Therefore one is confronted with a constrained minimization problem, where the minimization of ∥x∥ (or equivalently x22si19_e) is to be accomplished subject to Eq. (3.1). Adjoining the constraint, via a vector of Lagrange multipliers (λ), the objective function to be minimized is

H=12x2+λT(yAx)=12xTx+λT(yAx).

si20_e  (3.12)

By evaluating the respective gradients

Hx=xATλ=0,

si21_e  (3.13)

Hλ=yAx=0.

si9000_e  (3.14)

From Eq. (3.13), one has

x=ATλ.

si22_e  (3.15)

Substitution of Eq. (3.15) into Eq. (3.14) yields

y=AATλ.

si23_e  (3.16)

That is

λ=(AAT)1y.

si24_e  (3.17)

The existence of (AAT)−1 is guaranteed by virtue of Eq. (3.11). Substitution of Eq. (3.17) into Eq. (3.15) yields

x=AT(AAT)1y.

si25_e  (3.18)

In this case (n < m) the generalized inverse of A is given by

A+=AT(AAT)1.

si26_e  (3.19)

For the sake of completeness it should be noted that the norm minimization of ∥e∥ and ∥x∥ of Eqs. (3.6), (3.12), respectively, can be performed by the extended vector norms, where a norm of a vector w is defined as wTQw and Q is a compatible positive definite weighting matrix. By doing so, one can choose the relative emphasis of the magnitudes of the vector components that is minimized.

For case A, Eq. (3.6) becomes

z=eTQe=(yAx)TQ(yAx)

si27_e  (3.20)

with the solution

x=(ATQA)1ATQy.

si28_e  (3.21)

From case B, Eq. (3.12) becomes

H=12xTQx+λT(yAx)

si29_e  (3.22)

with the solution

x=Q1AT(AQ1AT)1.

si30_e  (3.23)

In summary, a unified framework has been presented showing that the generalized inverse of a matrix can be viewed as a result of a minimization problem leading to a practical interpretation.

3.2 The full rank decomposition theorem

Concerning the situations of whether a matrix being full row rank or full column rank, we have the following obvious statement.

Theorem 3.2.1

When ACm×n, if

(1) rank A = m, A is of full row rank,

(2) rank A = n, A is of full column rank.

It is obvious that if A is of full row rank, then mn and if A is of full column rank, then mn. A sufficient and necessary condition for a matrix to be of full rank is that it is both of full row rank and of full column rank.

Theorem 3.2.2

Assuming ACm×n, if there is a full column rank matrix F and full row rank matrix G making that

A=FG,

si31_e

then A has a full rank decomposition.

Theorem 3.2.3

When ACm×n, rank A = r > 0, there must be a full rank factorization for it.

Proof

For the matrix A, if rank A = r > 0, one is able to choose r linear independent columns from the columns of A: Ai1,Ai2,,Airsi32_e. The remaining columns can been expressed by these columns. That is, we have

F=[Ai1Ai2Air].

si33_e

It is seen that, the matrix F is a full column rank matrix. Therefore, there exists permutation matrix Q and S such that

AQ=[Ai1Ai2A1~An1~]=[FFS]=F[IrS].

si34_e

Letting G = [IrS]Q−1, G is a full row rank matrix. So A has a full rank decomposition

A=FG.

si35_e

Theorem 3.2.4

Some important properties of full column (row) rank matrix: if FCm×r is a full column rank matrix, then one has

 the eigenvalue of FHF is larger than zero;

 FHF is positive definite Hermite matrix;

 FHF is an r order invertible matrix.

Proof

If λ is any eigenvalue of FHF, the vector x is the corresponding nonzero feature vector, one has

FHFx=λx.

si36_e

In this case, we have

λx22=λxHx=xH(FHFx)=(Fx)H(Fx)=Fx220.

si37_e

Considering F is a full column rank matrix, and Fx≠0, so Fx22>0si38_e. We therefore draw the conclusion: λ > 0.

Analogously, if GCr×n is a full row rank matrix, then GGH has the above properties. When ACm×n, BCn×m, if AB = Im, then B is the right inverse of A denoted by B = Aright−1 and A is the left inverse of B denoted by A = Bleft−1.

3.3 The least square solution to an algebraic matrix equation

The method of least squares is a standard approach in regression analysis to the approximate solution of the over determined systems, in which among the set of equations there are more equations than unknowns. The term “least squares” refers to this situation, the overall solution minimizes the summation of the squares of the errors, which are brought by the results of every single equation.

3.3.1 The solution to the compatible linear equations

We have the following general solution of a given linear matrix equation, which is formulated in terms of the Moore-Penrose inverse.

Theorem 3.3.1

If ACm×n, bCm, xCn, and if the system of linear equations Ax = b is compatible, the general solution is given by

x=A+b+(InA+A)t(tCn).

si39_e

Proof

Because Ax = b is compatible, there is a x0 making Ax0 = b. So we have on one hand

Ax=AA+b+A(InA+A)t=AA+Ax0+(AAA+A)t=Ax0+Ot=b.

si40_e

On the other hand, if x0 is any solution of Ax = b, such that t = x0. Therefore,

A+b+(InA+A)t=A+b+(InA+A)x0=A+b+x0A+Ax0=A+b+x0A+b=x0.

si41_e

So far, we have proved that

x=A+b+(InA+A)t(tCn)

si42_e

is the general solution of the equation Ax = b, which concludes the proof.

In particular, when b = 0, we obtain the general solution of the homogeneous linear equations Ax = 0 given by

x=(InA+A)t(tCn).

si43_e

3.3.2 The least square solution of incompatible equation

Considering in many applications, such as data processing, multivariate analysis, optimization theory, modern control theory, and networking theory, the mathematical model of linear matrix equation is often an incompatible one. We hereafter use the generalized inverse matrix to represent the following general solution of incompatible linear equations.

Theorem 3.3.2

When ACm×n, bCm, if there is x*∈ Cn such that

Ax*b2=minxCn{Axb2},

si44_e

then the component x* is termed as the least square solutions to the system of equations Ax = b.

Theorem 3.3.3

When ACm×n, all the least square solutions of system of equation Ax = b are given by

x*=A+b+(InA+A)t(tCn).

si45_e

Proof

xCn, we define

φ(x)=Axb22=xHAHAx2xHAHb+bHb

si46_e

and denote x = μ + (μ, νRn), we then have

φxφμ+iφν=2AHAx2AHb=0.

si47_e

Therefore we have AHAx = AHb. Because

rankAHArank[AHAAHb]=rankAH[Ab]rankA=rankAHA,

si48_e

due to the rank of the augmented matrix is equal to that of the coefficient matrix, so AHAx = AHb is compatibility equations. Assuming φ(x) is nonnegative real function, according to multivariate function extreme value theory, the solution of normal equations is the minimum point of φ(x). Subsequently, the total least square solution of equations Ax = b is

x*=(AHA)+AHb+(In(AHA)+AHA)t=A+b+(InA+A)t(tCn).

si49_e

3.3.3 The minimum norm least squares solution for the equations

From the above discussions, whether the equations are compatible or not, the least-square solutions can be expressed as

x*=A+b+(InA+A)t(tCn).

si45_e

Because the least squares solution is not unique, one is able to find a least square solution of minimum norm, which is given by

x**2=min{x*2:x*=A+b+(InA+A)t,tCn},

si51_e

where x** is referred to as the very minimal norm least square solution to the equations Ax = b. We have the following theorem:

Theorem 3.3.4

x** = A+b is the minimal norm least squares solution for the equations

Proof

By the nature of the A+, we have

A+b,(InA+A)t=bH(A+)H(InA+A)t=bH[(A+)H(A+)HA+A]t=bH[(A+)H(A+)H(A+A)H]t=bH[(A+)H(A+AA+)H]t=bH[(A+)H(A+)H]t=0.

si52_e

The least-square solutions of equations Ax = b is composed of two orthogonal vectors, so

x*22=A+b22+(InA+A)t22A+b22(tCn).

si53_e

This suggests that x** = A+b is the minimal norm least square solution for the equation Ax = b.

3.4 The singular value decomposition

The Moore-Penrose inverse can be obtained through singular value decomposition (SVD): A = UDVT, such that A+ = V D−1UT. We have the following definition

Definition 3.4.1

When ACm×n, denote the following positive characteristic values for Hermite matrix AHA

λ1,λ2,,λn.

si54_e

Then

σ1=λ1,σ2=λ2,,σn=λn

si55_e

are the single values of A.

If ACm×n, rank A = r, then AHA has r positive eigenvalues and the remaining eigenvalues are zeros. Further, we assume that they can be ordered in the following manner

σ1σ2σr>σr+1=σr+2==σn=0.

si56_e

Theorem 3.4.1

When ACm×n, rank A = r > 0, the so-called singular value decomposition (SVD) of A is

UHAV=Σ000,

si57_e

where the matrix U is the m order unitary matrix, the matrix V is the n order unitary matrix, and the matrix Σ is the following diagonal matrix

Σ=diag(σ1,σ2,,σr),σ1σ2σr>0.

si58_e

Note in the above the component r is the positive singular value of A.

Proof

If σ12,σ22,,σr2,0,,0si59_e are all the eigenvalues of A, and σ1σ2 ≥⋯ ≥ σr > 0. Because AHA is nonnegative definite Hermite matrix, there is a unitary matrix V with the order n, satisfying the following associations

UH(AHA)V=σ12σr200Σ2000.

si60_e

Among them =diag(σ1,σ2,,σr)si61_e. By partitioning the matrix V into some subblocks

V=[V1,V2],V1Cn×r,V2Cn×(n×r)

si62_e

then we will have

V1HV2H(AHA)[V1V2]=V1HAHAV1V1HAHAV2V2HAHAV1V2HAHAV2=Σ2000.

si63_e

Then

V1HAHAV1=Σ2,V1HAHAV2=O,V2HAHAV1=O,V2HAHAV2=O.

si64_e

So

(AV1Σ1)H(AV1Σ1)=Ir,(AV2)H(AV2)=O,AV2=O.

si65_e

By letting

U1=AV1Σ1,

si66_e

where U1 is part of the unitary matrix and extending it into an m order unitary matrix U = [U1U2], U2 is part of the column unitary matrix, we thus have

UHAV=U1HU2HA[V1V2]=U1HAV1U1HAV2U2HAV1U2HAV2=U1H(U1)OU2H(U1)O=000,

si67_e

which concludes the proof.

Singular value decomposition is a factorization of a rectangular matrix A into three matrices U, D, and V. The two matrices U and V are orthogonal matrices (UT = U−1, V VT = I) while D is a diagonal matrix. The factorization means that we can multiply the three matrices to get back the original matrix A = UDVT. The transpose matrix is obtained through AT = V DUT. Since both orthogonal matrix and diagonal matrix have many nice properties, SVD is one of the most powerful matrix decomposition and is used in many applications, such as the least square solution (regression), feature selection (PCA, MDS), spectral clustering, image restoration and three-dimensional computer vision (fundamental matrix estimation), equilibrium of Markov Chain, and many others.

Matrix U and V are not unique, their columns come from the concatenation of eigenvectors of symmetric matrices AAT and ATA. Since eigenvectors of symmetric matrix are orthogonal (and linearly independent), they can be used as basis vectors (coordinate system) to span a multidimensional space. The absolute value of the determinant of orthogonal matrix is one, thus the matrix always has inverse. Furthermore, each column (and each row) of orthogonal matrix has unit norm. The diagonal matrix D contains the square of eigenvalues of symmetric matrix ATA. The diagonal elements are nonnegative numbers and they are called singular values. Because they come from a symmetric matrix, the eigenvalues (and the eigenvectors) are all real numbers (no complex numbers).

Numerical computation of SVD is stable in terms of round off error. When some of the singular values are nearly zero, we can truncate them as zero and it yields numerical stability. If the SVD factor matrix A = UDVT, then the diagonal matrix can also be obtained from D = UTAV. The eigenvectors represent many solutions of the homogeneous equation. They are not unique and correct up to a scalar multiple.

Properties: SVD can reveal many things:

1. Singular value gives valuable information as to whether a square matrix A is singular. A square matrix A is nonsingular (i.e., have inverse) if and only if all its singular values are different from zero.

2. If the square matrix A is nonsingular, the inverse matrix can be obtained by

A1=VD1UT.

si68_e

3. The number of nonzero singular values is equal to the rank of any rectangular matrix. In fact, SVD is a robust technique to compute matrix rank against ill-conditioned matrices.

4. The ratio between the largest and the smallest singular value is called the condition number, which measures the degree of singularity and reveals ill-conditioned matrix.

5. SVD can produce one of matrix norms, which is called the Frobenius norm, by taking the sum of square of singular values

AF=iσi2=ijaij2.

si69_e

The Frobenius norm is computed by taking the sums of the square elements in the matrix.

6. SVD can also produce a generalized inverse (pseudo-inverse) for any rectangular matrix. In fact, the generalized inverse is also a Moore-Penrose inverse by setting

A+=VD01UT,

si70_e

where the matrix D01si71_e is equal to D−1 but all nearly zero values are set to zero.

7. SVD also approximates the solution of the nonhomogenous linear system AX = b such that the norm is minimum minAbxsi72_e. This is the basic of least square, orthogonal projection and regression analysis.

8. SVD also solves the homogenous linear system by taking the column of VT, which represents the eigenvector corresponding to the zero eigenvalue of symmetric matrix ATA.

References

[43] Kardi T. Linear Algebra Tutorial. 2001. http://people.revoledu.com/kardi/toturial/LinearAlgebra/.

[44] Shapiro E.Y., Decarli H.E. Generalized inverse of a matrix: the minimization approach. AIAA J. 1976;14(10):1483–1484.

[51] Udwadia F.E., Phohomsiri P. Recursive determination of the generalized Moore–Penrose M-Inverse of a matrix. J. Optim. Theory Appl. 2005;127(3):639–663.

[52] Campbell S.L., Meyer C.D. Generalized Inverses of Linear Transformations. New York, NY: Dover; 1991.

[53] Penrose R. A generalized inverse for matrices. Proc. Cambridge Philos. Soc. 1955;51(4):406–413.

[54] Ben-Israel A., Greville T.N.E. Generalized Inverses: Theory and Applications. New York: Wiley; 1974.


“To view the full reference list for the book, click here

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.120.109