8.2 MATRIX FACTORIZATIONS AND BLOCK MATRICES

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8.2 MATRIX FACTORIZATIONS AND BLOCK MATRICES

Young men should prove theorems, old men should write books.

—GODFREY H. HARDY (1877–1947)

Mathematics is like checkers in being suitable for the young, not too difficult, amusing, and without peril to the state.

—PLATO (ca. 429–347 BC)

Introduction

Whenever we encounter an equation A = BC, we have a factorization of the matrix A into a product of the two matrices B and C.Insomecases, we have more factors, as in this familiar case encountered in Section 6.1: A = PDP⁻¹. There is also a QR-factorization A = QR, and a singular-value decomposition (SVD), A = PDQ. These and other factorizations all have their uses, and we begin with a simple one, the LU factorization, that is often a part of the row-reduction process.

Permutation Matrix

First, however, the concept of a permutation matrix is needed. Such a matrix is obtained by reordering the rows of an identity matrix, I. For example, here is a typical permutation matrix:

Notice that the product PA is the matrix A with its rows reordered in exactly the same way as the rows of I were reordered to produce P. For example, we obtain

The mapping from the original ordering to the reordering is (1, 2, 3, 4) (3, 4, 1, 2).

EXAMPLE 1

What permutation matrix can be applied to the matrix

in order to get an echelon form?

SOLUTION We can interchange the first row and the second row. An equivalent procedure would be to multiply A on the left with this permutation matrix:

Consequently, we obtain

Well-written software for solving systems of linear equations chooses the pivot elements carefully to ameliorate the bad effects of round-off error. The natural order of the rows gives way to an ordering that pays attention to the magnitudes of the elements available as pivots. In general, we choose strong pivot elements to avoid growth of round-off errors.

LU -Factorization

The LU-factorization of a matrix A is usually written as A = LU. In this equation, L is a lower triangular matrix having 1’s on its diagonal. (One says that L is unit lower triangular.) The factor U is upper triangular. Here is an example of the equation A = LU:

The usefulness of the LU-factorization depends on the following thought process. Suppose that we wish to solve a linear system

If a factorization A = LU is available, it can be put to immediate use. Because A = LU, the problem can be written as LUx = b.Introduce a new variable y = Ux. Solve these two linear systems (in the order indicated):

The solution sought is x because

We illustrate with the preceding matrix.

EXAMPLE 2

Solve the following system of equations, making use of the LU-factorization of the preceding coefficient matrix.

SOLUTION There are two systems to be solved, each having a triangular coefficient matrix. First, we have

We solve this using forward substitution, getting y = [13, 21, −30]^T. Then we solve the second system:

The solution of this system is x = [3, −4, 5]^T, which we obtained by using back substitution.

If we carry out Gaussian elimination on this matrix

we shall see that the LU-factorization is a by-product. By using elementary matrices, one can turn A into an upper triangular matrix. Here are the steps:

We have established the equation

Consequently, we have also

The inverses of these elementary matrices are obtained by simply changing the signs of the multipliers. Hence, we find

Notice how the entries in L are built up directly from entries in the elementary matrices , which are the negative values of the multipliers.

To illustrate how the LU-factors of a matrix can be found in a more direct way, let us use the same matrix A as in Example 2. Start with A as given, and L in skeleton form:

We carry out some row-reductive steps, adding a multiple of one row to another row. The negative of the multipliers are stored in the matrix L.

At the end of the process, U is on the left and L is on the right. It is not necessary to continue writing the emerging matrix L; we are only filling in the holes, so to speak.

Also, notice that when we store the negative of the multipliers in L, we assume that a multiple of one row in A is added to a row below it. One sees that the negative multipliers to be stored in L are plus or minus the quotients of elements from A or Ã. Thus, in the previous example, the first multiplier is , and 3 is stored. The next multiplier is −14/2 = −7, and we store 7. This is a simple way to describe the negative of multipliers that are being used to fill in the gaps in L. Also, we see that the negative of the multipliers are stored in the matrix L in positions corresponding to the locations in the other matrix where zero entries are being created. (For nonsquare matrices, the matrix L may need to be extended with columns from the identity matrix. See Example 4.)

It is easy to give examples where there is no LU-factorization. Here is one:

If we set up the equation A = LU, it looks like this:

Then we must have b = 0 and ab = 1, which are incompatible demands. However, it is obvious that by interchanging or swapping the rows of A in this example, we obtain a (trivial) example of an LU-factorization:

To handle a wider class of matrices in the LU-factorization process, we can permit permutations of the rows. This means changing the order of the rows. This is done automatically in most mathematical software packages. To permute the rows of a matrix, we can premultiply the matrix by a permutation matrix, as explained previously.

EXAMPLE 3

Illustrate permuting the rows of a matrix A by multiplication on the left by a permutation matrix P. Then find the LU-factorization of PA.

SOLUTION

These are the matrices that MATLAB returns as the LU-factorization of the matrix A.

EXAMPLE 4

The LU-factorization is also available when the matrix A is not square. The principles remain unchanged. For example, how can one compute the LU-factorization of the following matrix?

SOLUTION The basic steps are as in the square case. In the first step we use row operations to create zeros in column one of A, and store the negatives of the multipliers in the first column of the square matrix L in the order in which they were used to zero out entries:

In the next step, two further row operations are applied and the negatives of the multipliers are stored in the second column of matrix L:

The final step brings us to

The conclusion is that

Here the entries in the matrix L below the main diagonal are the negative values of the multipliers used. They are located in the positions where zeros are being created. For nonsquare matrices A, the lower triangular matrix L is square and it may need to be extended with columns from the identity matrix. It is usually clear what needs to be done. If A is m × n, then L is m × m and U is m × n.

LL^T-Factorization: Cholesky Factorization

The LU-factorization of a symmetric matrix A into the product of a lower and an upper triangular matrix has a special form. Namely, we can write A = LL^T. This may involve complex numbers, even if the matrix A is real. But the special case where A is a real, symmetric, and positive definite matrix involves only real numbers. This is called the Cholesky factorization.⁴ It is important because symmetric, positive definite matrices arise frequently in applications.

DEFINITION

A real matrix A is positive definite if x^T Ax > 0 whenever the vector x is different from 0. The corresponding definition for a complex matrix is that x^HAx > 0 for all nonzero x (real or complex).

There are various tests for positive definiteness, such as the positivity of the determinants of the principal minors. This is called the Sylvester criterion. The principal minors are the square submatrices lying in the upper left-hand corner of A. An n × n matrix has n of these principal minors. All the eigenvalues of a positive definite matrix are positive. Further consequences of positive definiteness are explored in General Exercises 21–25. Every positive definite matrix is invertible (nonsingular), and its inverse is positive definite.

⁴ The LL^T-factorization is named after Major Andre-Louis Cholesky (1875–1918), who was a geodesist in the French army.

A property weaker than positive definite is nonnegative definite, which means that x^HAx ≥ 0 for all x.

EXAMPLE 5

Use the test alluded to above to determine whether the real symmetric matrix

is positive definite.

SOLUTION The determinants of the principal minors are

Because these are positive numbers, the matrix A is positive definite.

EXAMPLE 6

Find the Cholesky factorization of this real, symmetric, and positive definite matrix:

SOLUTION We want to find the values of r, s, t, u, v, and w in the equation

One can solve for the unknowns in a systematic way:

1. 4 = r² and r = 2

2. 10 = rs = 2s and s = 5

3. 14 = ru = 2u and u = 7

4. 41 = s² + t² = 25 + t² and t = 4

5. 59 = su + tv = 35 + 4v and v = 6

6. 94 = u² + v² + w² = 49 + 36 + w² and w = 3

Finally, one can verify the factorization as shown here:

EXAMPLE 7

Find the Cholesky factorization for an arbitrary 4 × 4, symmetric, positive definite matrix.

SOLUTION We have to solve for the unknowns, called _ij, in this equation:

The equations look like this:

The solutions are as follows:

This can be simplified, as shown in the following pseudocode.

A computer code for the Cholesky factorization can be quite succinct. Start with a symmetric matrix A, assumed to be positive definite and of dimensions n × n. We want A = LL^T. Let the elements of L be denoted by _ij. Then _ij = 0 for j > i. The pseudocode for the Cholesky factorization algorithm is

In this abbreviated code, the convention is adopted that a summation is defined to be zero if the beginning index exceeds the ending index. Thus, we have for example. The requirement that the matrix A be symmetric positive definite guarantees that the second line of the pseudocode involves only the square roots of positive numbers according to the following theorem.

THEOREM 1

Every real symmetric positive definite matrix can be factored as LL^T, where L is a real lower triangular matrix having positive elements on its diagonal.

PROOF Consult Golub and van Loan [1996, p. 87].

LDL^T-Factorization

Under certain conditions, we can find the Cholesky LL^T-factorization directly from the LU-factorization. Suppose we are given a real, symmetric, positive definite matrix A and its factorization A = LU, where L is a unit lower triangular matrix and U is an upper triangular matrix with positive diagonal elements. We can extract the diagonal elements from the matrix U and use them to form a diagonal matrix D so that A = LDL^T. Then we can split the matrix so that because the diagonal matrix D has positive diagonal elements. We obtain

where .

EXAMPLE 8

Carry out the process just described on the following matrix:

SOLUTION First, we carry out the forward elimination phase of Gaussian elimination

and

Evidently, we have found the LU-factorization, the LDL^T-factorization, and the LL^T-factorization.

QR-Factorization

Another type of matrix factoring is called the QR-factorization. Here we include nonsquare matrices in the discussion because this factoring is useful in solving overdetermined systems of linear equations, where a least-squares solution is sought. In the factorization A = QR, we allow A and Q to be m × n. The columns of the matrix Q form an orthonormal set in R^m so that Q^TQ = I. Therefore, we suppose that m ≥ n. The factor R is an n × n, upper triangular, invertible matrix. How is this factorization employed to get the least-squares solution of a system of equations Ax = b? Simply solve the system Rx = Q^T b for the vector x.

THEOREM 2

The vector x is the least-squares solution of the system of equations Ax = b under the circumstances just described.

PROOF We compute

Consequently, the normal equations are satisfied:

In MATLAB, the command [Q, R] = qr (A) produces the desired factors. Actually, in MATLAB, if we want to solve a system only in the least-squares sense, the command x=A automatically does this. Behind the scenes, the QR-factorization is being employed. There are analogous commands in Maple and Mathematica. The theorem governing this factorization follows.

THEOREM 3

Any m × n matrix of rank n has a QR-factorization, where Q is an m × n matrix, Q^TQ = I_n, and R is an n × n, upper triangular, and invertible matrix.

PROOF Let A be an m × n matrix of rank n. Since A has rank n, m ≥ n. Let the columns of A be a₁, a₂, …, a_n. By applying the Gram–Schmidt process to this set of columns, we obtain an orthonormal set of vectors {q₁, q₂, …, q_n} such that, for each value of i,

Hence for suitable coefficients, r_ij, we have

for i = 1, 2, …, n. Create an n × n matrix R whose generic elements are r_ij. Supply zero values for r_ij when i > j. This matrix satisfies the equation

Since A has rank n, R is invertible. The orthonormality of the set of columns in Q is expressed by the equation Q^T Q = I.

EXAMPLE 9

Find the QR-factorization of this matrix:

SOLUTION The proof of Theorem 3 can be followed, step-by-step. To start the Gram–Schmidt process, normalize the first column a₁. This produces , as column one in Q. Next, we subtract from the second column a₂ its projection on q₁. This yields a temporary vector

Normalizing v yields . Note that our work gives us a₁ = 2q₁ and a₂ = 2q₁ + 2q₂. These equations imply that r₁₁ = r₁₂ = r₂₂ = 2 and r₂₂ = 0. Hence, we obtain

Singular-Value Decomposition (SVD)

This section is devoted to the singular-value decomposition of an arbitrary matrix. It has many uses, among them a way of producing a reliable estimate of the rank of a matrix. The main theorem is easily stated:

THEOREM 4

Any matrix can be put into the factored form PDQ, where P and Q are unitary and D is diagonal.

PROOF Let A be m × n. Since unitary matrices are square, P must be m × m, and Q must be n × n. Consequently, D must be m × n. All entries above and below the diagonal of D are to be zero. The matrix A^H A is Hermitian because (A^HA)^H = A^H(A^H)^H = A^H A. Furthermore, A^H A is nonnegative definite because

By Corollary 2 in Section 8.1, the eigenvalues of A^H A are real and nonnegative; we can therefore denote them by . The labeling can be such that

The numbers σ_i are called the singular values⁵ of A. If this notation is used, σ_r is the last nonzero singular value. The rank of A^H A is r, because n − r of the eigenvalues are zero. (Why?)

By the Spectral Theorem (Section 8.1), there is an orthonormal set of eigenvectors u_i for A^H A. Hence, we obtain

and for 1 ≤ i ≤ n. Now we have

⁵ Gene Howard Golub (1932–2007) created, along with William Kahan, an algorithm for computing the singular value decomposition. The SVD algorithm is used in a wide variety of applications such as signal processing, data analysis, and Internet search engines. Because of its versatility, this algorithm has been called the Swiss Army Knife of numerical computations. See Golub and Kahan [1965]. Gene was fondly known as Professor SVD.

Consequently, we obtain Au_i = 0 for i = r + 1, r + 2, …, n. We have

Define Q to have rows . Then Q is n × n and is unitary: QQ^H = Q^HQ = I_n. Define for 1 ≤ i ≤ r. These vectors have the property

Select additional vectors v_i so that {v₁, v₂, …, v_n} is an orthonormal base for . Let P be the matrix whose columns are v_i for 1 ≤ i ≤ m. Let D be the m × n diagonal matrix having σ_i on its diagonal. Now we have

because

Schur Decomposition

In 1909, Schur⁶ established that any square matrix can be reduced to a triangular matrix using a unitary similarity transformation.

THEOREM 5 Schur Decomposition

If , then there exists a unitary matrix so that A = UTU^H and the eigenvalues of A appear on the diagonal of the upper triangular matrix .

The proof of Schur’s Theorem can be found in Kincaid and Cheney [2002] and Golub and Van Loan [1996]. As a special case, every Hermitian matrix is unitarily similar to a diagonal matrix.

⁶ Issai Schur (1875–1941) was born in Germany and died on his birthday in the city of Tel Aviv. He was the victim of Nazi persecution in the 1930s and emigrated to what is now Israel. He was a student of the famous mathematician Frobenius in Berlin. Schur was a superb expositor and attracted as many as 400 students to his lectures in huge auditoriums! He ascended through the academic ranks at the University of Berlin to Professor in 1919.

COROLLARY 1

If is a Hermitian matrix, then there exists a Hermitian matrix so that A = UDU^H, where the eigenvalues appear in the diagonal matrix D.

Another special case is that every square real matrix is similar to a triangular matrix.

COROLLARY 2

If , then there exists an invertible matrix and a triangular matrix T such that A = STS⁻¹ and the eigenvalues of A appear on the diagonal of T.

EXAMPLE 10

We illustrate Schur’s theorem by finding the decomposition of

SOLUTION The characteristic equation Det(A − λI) = λ² − λ − 6 = 0 gives us the eigenvalues −2 and 3. By solving A − λI = 0 with each of these eigenvalues, the corresponding eigenvectors are v₁ = [−1, 1]^T and v₂ = [7, −2]^T. Using the Gram–Schmidt orthogonalization process, we obtain u₁ = v₁ and . After normalizing these vectors, we obtain the unitary matrix

which has the property UU^H = I. Finally, we obtain the Schur decomposition

which is an upper triangular matrix having the eigenvalues on the diagonal.

Partitioned Matrices

Frequently, it is convenient to partition matrices into submatrices (called blocks) and to carry out matrix operations using the blocks. Let , , and be matrices, respectively, partitioned into the submatrices A_ij, B_ij, and C_ij of appropriate dimensions.

If and have the same dimensions and the same partitioning, then multiplication by the scalar α is

which is done block-by-block as C_ij = αA_ij. Matrix addition and subtraction are similarly done blockwise:

where C_ij = A_ij ± B_ij for each block.

To establish the general results for multiplication of partitioned matrices, we compute products as if the blocks were scalars. We assume that the partitioning has been done in such a way that each product A_isB_sj can be formed. Then

where

The column partitions of match the row partitions of so that and are conformable for block multiplication.

An example of a 2 × 3 block partitioned matrix is

Consider this 3 × 2 block partitioned matrix:

The block product exists and turns out to be

As one sample of the calculation, we show how C₂₂ is computed:

One must be careful not to disturb the order of the submatrices in the multiplications.

Solving a System Having a 2 × 2 Block Matrix

Consider the 2 × 2 block linear system

where A is an n × n real, symmetric, invertible matrix. The matrix B is an n × m real matrix of full rank. We assume that A is positive definite on the nullspace of B. The vectors x and b are column vectors in , whereas y and c are column vectors in .

In a typical application, the coefficient matrix is a 2 × 2 block matrix whose dimension in the usual sense is (n + m) × (n + m). It is a symmetric and nonnegative definite matrix (having nonnegative eigenvalues).

Suppose that the solution of the preceding block linear system is sought. If B is invertible (n = m), the solution can be found immediately:

The notation B^−T means (B⁻¹)^T.

If the matrix B is not invertible, Gaussian elimination can be used on the augmented matrix of the original 2 × 2 block system. Supposing that A is invertible, then

Here we multiply the first system by −B^T A⁻¹ and add it to the second system of equations. Consequently, we obtain the reduced system

leading to the solution

provided that B^T A⁻¹B is invertible.

The LU-factorization of the block matrix is

The LDL^T-factorization is

where the factorization of the reduced system is . These factorizations, and others, such as in General Exercises 33–35, can be useful in reducing the number of floating-point operations when the conjugate gradient method is being used. See Dollar and Wathen [2006].

Applications in which one wishes to solve matrix systems of this form are, for example, convex quadratic programming problems, where we seek to minimize

and saddle-point problems involving finite-element approximations of variational problems.

Suppose we want to solve the block linear system

where dimensions of the submatrices are as follows: A is n × n, B is n × m, C is m × n, and D is m × m. The column vectors x and b have n components and y and c have m entries. It is assumed that A is invertible.

We are given the block coefficient matrix with the assumptions mentioned previously as well as the right-hand side vectors b and c. Using the reduction process with the augmented block matrices, we have

Rather than solving the (n + m) × (n + m) system, we can work with the following smaller m × m and n × n systems

Here we assume that we can find the inverses of A and D − CA⁻¹B. The latter is called the Schur complement of A. In practice, the matrix A should be well conditioned to obtain accurate results. Similarly, one could solve for x first and then y as in General Exercise 53.

Inverting a 2 × 2 Block Matrix

Here we derive a formula for the inverse of a 2 × 2 block matrix of the form

The dimensions of the submatrices are as follows: A is n × n, B is n × m, C is m × n, and D is m × m. It is assumed that A and D are invertible.

Using the reduction process with block matrices, we have

The term D − CA⁻¹B is called a Schur complement of A. The first step is the same as multiplying on the left by a block matrix:

The second step is the same as multiplying on the right by a block matrix:

Putting these two block matrix equations together, we have

Because the inverses of these lower and upper block matrices are obtained by changing the sign of the off-diagonal block matrices, we have

and

Here we must assume that D − CA⁻¹B, the Schur complement of A, is invertible.

A second approach is to eliminate the (1, 2) block first and then the (2, 1) block, obtaining

Here we assume that the Schur complement of D is invertible; it is A − BD⁻¹C.

By equating terms in the final result from these two approaches, one obtains the Woodbury matrix identity:

A special case of this is the Woodbury formula

where U and V are matrices of the appropriate sizes. This is useful in some numerical computations when one has already computed A⁻¹ and wants to compute (A + UV)⁻¹. Another special case is the Sherman-Morrison formula

where λ = v^T A⁻¹u and uv^T is the outer product of column vectors u and v.

Application: Linear Least-Squares Problem

We now show how to solve a least-squares problem by three different approaches. Consider

In matrix notation, we have

First, we find a least-squares solution using calculus. The total squared error is

This is a nonlinear function of x₁ and x₂. We want E to be a minimum. Using calculus, we set ∂E/∂x₁ = 0 and ∂E/∂x₂ = 0. Taking these partial derivatives, we find

The collecting of terms and simplifying lead to this linear system:

Gaussian elimination is straightforward to find the solution x₁ = −2/25 and x₂ = 19/25. We find that the resulting error is E = 6/25 = 0.24.

Next, we solve via the normal equation. Writing the original system in matrix–vector form Ax = b, we arrive at the normal equation:

It is no accident that the normal equations are the same as those found from the first approach above! Again, the solution is

Finally, we find the least-squares solution using the QR-factorization. We let

The first step is to orthogonalize the columns a₁ and a₂ of the coefficient matrix. Let q₁ = a₁ and

Now we find that these vectors and form an

orthogonal set because: q₁•q₂ = 0. Normalize them to get

Recall that if A = QR and Q^T Q = I, then Q^T A = Q^T QR = R. Consequently, the system Ax = b becomes Rx = Q^Tb and x = R⁻¹Q^Tb, which is the solution of the least-squares problem. In this example, we have

Then we find

Solving, we obtain . Here is the solution to the least-squares problem Ax = b, and the error is . How close is A to b? The solution found above leads to

Its Euclidean norm is

Mathematical Software

When using mathematical software to obtain the LU-factorization of a matrix, one must be aware that the algorithm may swap rows to produce strong pivot elements. For example, consider the following matrix:

We enter the matrix A into the chosen computer software and then request the LU-factorization with the appropriate commands. In MATLAB, we use the command shown.


MATLAB


A = [3,5,3;2,6,-4;5,4,-2]
[L,U,P] = lu(A)

We obtain the LU-factorization of a permuted version of A:

The pivot elements are taken in the order (3, 2, 1). One can verify that

Here are the corresponding Maple and Mathematica commands:

Maple	Mathematica
with(LinearAlgebra) : A := Matrix([[3,5,3], [2,6,-4], [5,4,-2]]); (P,L,U) := LUDecomposition(A) ;	A = {{3,5,3}, {2,6,-4}, {5,4,-2}} {LU,P,CondNo} = LUDecomposition[A]

In Maple, we discover that the order of the pivot elements is not changed from the natural order (1, 2, 3). In Mathematica, we find that the pivot elements are selected in the order (2, 1, 3) and the matrices L and U are stored in the array LU by omitting the diagonal 1’s from the unit lower triangular matrix L.

EXAMPLE 11

We can use the mathematical software in MATLAB, Maple, and Mathematica to compute the Cholesky decomposition A = LL^T for a typical example such as this matrix:

SOLUTION This matrix is obviously real and symmetric. It is also positive definite. In MATLAB, we use the commands below:


MATLAB


A = [4,-1,-1,0;-1,4,0,-1;-1,0,4,-1;0,-1,-1,4]
L = chol(A)
L*L'

We find the Cholesky factorization to be LL^T, where

One can check this independently by computing L^TL. In Maple, the corresponding commands are


Maple


with(LinearAlgebra):
A := Matrix([[4,-1,-1,0],[-1,4,0,-1],[-1,0,4,-1],
     [0,-1,-1,4]]);
L := LUDecomposition(A,method='Cholesky'),
L.Transpose(L)

We obtain the exact form of the matrix L:

and we can verify this by computing LL^T. In Mathematica, we use these commands:


Mathematica


<<LinearAlgebra'Cholesky'
A = {{4,-1,-1,0}, {-1,4,0,-1}, {-1,0,4,-1}, {0,-1,-1,4}}
L = CholeskyDecomposition[A];
MatrixForm[L.Transpose[L]]

Again, we obtain the exact form of the matrix L^T in a slightly different form:

One can easily show that all of these factorizations are the same.

EXAMPLE 12

Using the mathematical software in MATLAB, Maple, and Mathematica, find the least-squares solution of this system by using the QR-factorization of the coefficient matrix:

SOLUTION In MATLAB, we invoke these commands:


MATLAB


A = [4.,7., -3.;2.,-4.,6.;-3.,2.,5.;1.,-2.,4.]
b = [1.;5.;7.;-3.]
[Q, R] = qr(A)
x = RQ'*b
e = A'*R

The results are Q, R, numerical values for the solution vector, the residual vector, and the error vector:

Here are the Maple and Mathematica commands:


Maple


with(LinearAlgebra):
A := Matrix([[4,7,-3],[2,-4,6],[-3,2,5],[1,-2,4]]);
b := Vector([1.,5.,7.,-3.]);
(Q,R) := QRDecomposition(A);
xt := LinearSolve(R,Transpose(Q) .b);
x := evalf(LeastSquares(A,b));
r := A.x-b
e := Transpose(A).r


Mathematica


A = {{4.,7.,-3.}, {2.,-4.,6.}, {-3.,2.,5.}, {1.,-2.,4.}}
b = {1.,5.,7.,-3.} 
{Q, R} = QRDecomposition[A]
xt = LinearSolve[R,Q.b]
x = N[xt]
r = A.x-b
e = Transpose[A].r

It is informative to compare the results from these three software systems. In each system, we can obtain exact results, usually as quotients of integers.

These, in turn, can be expressed as floating-point numbers, with 15 decimal places:

EXAMPLE 13

Using the mathematical software in MATLAB, Maple, and Mathematica, find the singular values of this matrix and verify the factorization.

SOLUTION In MATLAB, we use these commands:


MATLAB


A = [2.0,2.0,2.0,2.0;1.7,0.1,-1.7,-0.1;
    0.6,1.8,-0.6,-1.8]
[P,D,Q] = svd(A)
A = P*D*Q

and we find this factorization:

Here are the Maple and Mathematica commands:


Maple


with(LinearAlgebra):
A := Matrix([[2,2,2,2],[1.7,0.1,-1.7,-0.1],
     [0.6,1.8,-0.6,-1.8]]);
P,Diag,Q := SingularValues(A,output=['U','S','Vt']);
P.DiagonalMatrix(Diag[1..3],3,4).Q;


Mathematica


A = {{2,2,2,2},{1.7,0.1,-1.7,-0.1},{0.6,1.8,-0.6,-1.8}}
{P,Diag,Q} = SingularValues[A]
MatrixForm[Transpose[P].DiagonalMatrix[Diag].Q]

We find similar results in using Maple and Mathematica.

SUMMARY 8.2

• A typical permutation matrix:

It embodies a reordering of the rows of an identity matrix I.

• A is positive definite if x^TAx > 0 for all x ≠ 0, when A is real. In the complex case, use x^HAx > 0.

• LU-factorization: A = LU, where L is unit lower triangular and U is upper triangular. Here , where E_i are elementary matrices (Gaussian elimination).

• One can solve Ax = b by solving (in order) Ly = b and Ux = y. Here A = LU.

• PLU-factorization: PA = LU, where P is a permutation matrix, L is a unit lower triangular matrix, and U is upper triangular.

• LL^T-factorization: A = LL^T, where L is lower triangular and A is symmetric (Cholesky factorization).

• LDL^T-factorization: A = LDL^T, where L is lower triangular, D is diagonal, and A is symmetric.

• Multiplication of partitioned matrices:

• 2 × 2 block LU-factorization:

• 2 × 2 block LDL^T-factorization:

where the factorization of the reduced system is .

• Theorems:

• If A is real, symmetric, and positive definite, then A = LL^T, where L is a real lower triangular matrix having positive elements on its diagonal.

• Let A be m × n. Let Q be m × n. Let R be n × n. The factorization A = QR is useful if the columns of Q form an orthonormal set and R is upper triangular and invertible. To get the least-squares solution of Ax = b, solve Rx = Q^Tb for x. (QR-factorization.)

• Let the matrix A be m × n and have rank n. Let Q be m × n and have an orthonormal set of columns. Let R be an n × n upper triangular matrix. If A = QR, then this factorization is called the QR-factorization of A.

• Let A be m × n, let P be m × n, let D be n × n, and Q be n × n, where P and Q are unitary and D is diagonal. If A = PDQ, then this is the SVD-factorization of A.

KEY CONCEPTS 8.2

Permutation matrices, forward and backward substitution, elementary matrices, positive definite matrices, factoring matrices, factorizations: A = LU (LU-factorization, Gaussian elimination), A = LL^T (Cholesky LL^T-factorization), A = LDL^T (LDL^T-factorization), A = QR, (QR-factorization), and A = PDQ (singular-value decomposition), block matrices, application: linear least-squares problem, normal equations

GENERAL EXERCISES 8.2

1. Find the LU-factorizations of these matrices:

2. (Continuation.) Using the factorization of B found previously, solve this system of equations

3. Find the LU-factors of these matrices:

4. Find the Cholesky factorization of

5. Find the Cholesky factorization of

6. Repeat Example 3 using

7. Find the Cholesky factorization of

8. Find the QR-factorization of

9. Solve Ax = b by using the LU-factorization

where and

10. Find the QR-factorization of

11. Find the Cholesky factorization of

12. Find the LU-factorization of

13. (Continuation.) Using the factorization LU, solve Ax = b, where b = [9, 29, 13, 0]^T.

14. Find the LU-factorization of

15. Using the LU-factorization, determine the inverse of

16. Find the LU-factorization of each of these matrices, if possible:

17. This exercise involves the LU-factorization of a nonsquare matrix. If A is an m × n matrix, we seek an m × m matrix for the factor L and an m × n matrix for the factor U. Find such a factorization for this matrix:

18. Let

Verify that A is not positive definite but A + 4I is. Elaborate on this: Is there a smallest number c for which A + cI is nonnegative definite? What is the general case, when A is n × n? Formulate and prove some theorems.

19. Verify that the matrix cannot be factored into the product of a unit lower triangular matrix and an upper triangular matrix. Note that it does not suffice to show that our algorithm fails.

20. Explain that if A has been factored as A = LU, where the usual assumptions are made, then the determinant of A is easily computed.

21. A square matrix A is positive definite if x^T Ax > 0 when the vector x is not zero. Establish that the Gram matrix arising from a linearly independent set of vectors is positive definite. Establish that the Gram matrix is symmetric.

22. Verify in general that if a matrix is positive definite, then its eigenvalues are positive.

23. Confirm that if a matrix is positive definite, then its diagonal elements are positive. (Use vectors of the form [0, …, 0, 1, 0, …, 0].)

24. (Continuation.) Find all the 2 × 2 symmetric positive definite matrices.

25. Establish that if a matrix A is positive definite, then its principal minors are also positive definite.

26. Establish the converse of Theorem 1: If L is lower triangular and has only positive elements on its diagonal, then LL^T is invertible, symmetric, and positive definite.

27. Let A be an invertible, n × n matrix. Factor it as A = QR as described in this section. Explain why RQ is similar to A.

28. (Micro-research Project.) Let B be an m × n matrix such that B^TB is a diagonal matrix. Let R be an invertible n × n matrix. How can we use these matrices to solve the system Ax = b in the least-squares sense, if A = BR? (You should discover an alternative to the use of a QR-factorization in solving least-squares problems. Certainly, the normalization of the columns in Q can be avoided.)

29. Find the Cholesky factorization of a general 2 × 2 symmetric positive definite matrix.

30. Find the LDL^T-factorization of a general 2 × 2 symmetric positive definite matrix where L is unit lower triangular and D is diagonal.

31. Explain why a lower triangular matrix L is positive definite if its diagonal elements are positive.

32. Establish that if the n × n real matrix A is positive definite, and if D is a diagonal matrix having positive elements on its diagonal, then DA is also positive definite.

33. In the subsection on 2 × 2 block matrices, verify the results for each of the following:

a. The inverse of the block matrix is when A and B are invertible.

b. The block LU-factorization:

c. The block LDL^T-factorization:

where the factorization of the reduced system is .

34. Consider the block 3 × 3 matrix where B₁ is invertible.

Verify that this matrix can be factored as , where

Here

35. (Continuation.) Verify that if this matrix is symmetric, then it can be factored as , where

Here ,

36. (Continuation.) Repeat the previous instructions, where

Here we choose an m × m matrix L₁ and (n − m) × (n − m) invertible matrix L₂ such that

This is Schilders’ factorization.

37. Find the determinants of the block matrices and

38. Let A be an n × n invertible matrix, and let u and v be two vectors in . Find the necessary and sufficient conditions on these vectors in order that the matrix be invertible, and give a formula for the inverse when it exists.

39. Let have the block-partitioned form

a. Argue that if A − BC is invertible, then so is .

b. Argue the stronger result that the dimension of the null space of is no greater than the dimension of the null space of A − BC.

c. What can you say about the case when I − CA⁻¹B is invertible?

40.

a. Let

Verify that

b. Let

Compute using these submatrices.

c. Verify that

(Here is the direct product of multiplying corresponding submatrices.)

d. Using the approach given in Part c, compute with the numerical submatrices in Part b.

e. Show that

f. Using the approach given in Part e, compute with the numerical submatrices in Part b. [Note: Parts c and e are examples of parallel matrix–matrix multiplication algorithms that would be particularly useful on parallel computer systems. Here the operations could be done in parallel on a 2D grid of four processors.]

41. If L is a lower triangular matrix with positive elements on its diagonal, does it follow that LL^T is positive definite?

42. Explain why or find a counterexample for the assertion that the nonzero eigenvalues of a skew-symmetric matrix are pure imaginary.

43. Find the inverse of this block matrix for m × m matrix A, m-vector b, and scalar c.

44. Let . Find P⁻¹ and write it as a factorization of two block matrices.

45. Consider and . In order that these matrices commute, what conditions must there be on A, B, C, D?

46. Let S_A = (D − CA⁻¹B) and S_D = (A − BD⁻¹C). Verify each of these results

47. (Continuation.) In the 2 × 2 scalar case, verify that both inverses reduce to where Δ = ad−bc.

48. Carry out the details in the second procedure to obtain the inverse of a 2 × 2 block matrix in factored form.

49. Show how the Woodbury matrix identity is obtained.

50. Derive the Sherman-Morrison formula from the Woodbury-Morrison formula.

51. Verify that the inverse of B is the identity B⁻¹ = A⁻¹ − B⁻¹(B − A)A⁻¹.

52. Verify that for an n × n upper (lower) block triangular matrix with diagonal blocks A_ii, we have using

In other words, verify that .

Find an example to show .

53. Find two forms of the solution of the 2 × 2 block linear system

54. Let have the block form shown in which the submatrices are n × n. Prove that if (B − I) is invertible, then

b. Repeat for A^T.

COMPUTER EXERCISES 8.2

1. Program and test the algorithm for the Cholesky factorization.

2. Let A be a square matrix whose eigenvalues are to be computed. Use the QR commands in MATLAB, Maple, or Mathematica to get A = QR. Define B = RQ, and then replace A by B. Repeat this process several times, getting new versions of A. For testing purposes, use whose eigenvalues are 3, 5, 7. This algorithm is the QR-algorithm for computing eigenvalues.

3. If you have access to MATLAB, Maple, or Mathematica, verify the results of the examples in this section.

4. For a given square matrix, compute the factorizations A = PDP⁻¹ and A = PDQ. (The matrix P is not assumed to be the same in each factorization.) What can you conclude?

5. Use mathematical software to compute the QR-factorization of

6. Find the QR-factorization of a general 2 × 2 symmetric positive definite matrix.

8.3 ITERATIVE METHODS FOR LINEAR EQUATIONS

Perhaps the greatest paradox of them all is that there are paradoxes in mathematics.

—E. KASNER AND J. NEWTON

Although this may seem a paradox, all exact science is dominated by the concept of approximation.

—BERTRAND SHAW (1872–1970)

Introduction

In this chapter, iterative procedures for solving systems of linear equations are introduced. The term iterative refers to a process that is repeated over and over again, using as the input in one step the output from the preceding step. Examples of this strategy have already been seen in dynamical systems: from a starting vector x⁽⁰⁾, one can generate a sequence of vectors using the formula x^(k) = Ax^(k−1), for k = 1, 2, 3, and so on. Typically, the number of steps needed in an iterative algorithm will be far less than the number of variables.

Iterative methods are frequently used to solve large sparse systems of linear equations where the number of equations and unknowns is enormous. (A large sparse matrix contains primarily zero values and relatively few nonzero entries.) Problems can be encountered where there are hundreds of millions of equations and hundreds of millions of unknowns. Such problems often originate in the numerical solution of partial differential equations. One advantage of iterative methods, in general, is that if the coefficient matrix has relatively few nonzero entries, the algorithms use only these nonzero entries with no fill-in of nonzero values for the zero values. (‘‘Fill-in’’ means that an entry in the coefficient matrix that is zero is changed to a nonzero value.) Furthermore, iterative methods produce a sequence of approximate solutions that improve as the iteration continues. The practitioner can stop the computer program when sufficiently accurate solutions have been obtained. Of course, the familiar row-reduction algorithm does not have this wonderful property!

Richardson Iterative Method

Here is an attempt to design an iterative scheme for solving the equation

The procedure takes the form

Naturally, the choice of G and c must depend on A and b. If this iterative scheme succeeds, then the sequence of iterates x^(k) will converge to a vector x^*, say. By passing to the limit in the previous equation, one arrives at

Because the objective is to solve Ax = b, we can try I − G = A and c = b. The resulting iteration formula is

This is called the Richardson iterative method.

To see that this method is not far-fetched, an example will be given where it works very well. (There are many examples where this process does not work at all!)

EXAMPLE 1

Solve the following system Ax = b. Use the Richardson iterative method just described.

SOLUTION The Richardson iteration formula is x^(k) = (I − A) x^(k−1) + b. A computer program can be written to initialize the vector x = (0, 0, 0) and then repeat the process 25 times as in the following pseudocode, for example.

The successive values of x can be printed or displayed as they are computed. When that is done, there is no need to save the intermediate values. A computer program can do all of this for us. The first few iterations produce x⁽⁰⁾ = (0, 0, 0), x⁽¹⁾ = (7, 5, 3), and x⁽²⁾ = (9.1, 3.4, 4.9). A MATLAB program to carry out this process is given at the end of this chapter. Using this program, we find that the approximate solutions get better and better, so that by the 15th step we have, in rounded form, x⁽¹⁵⁾ = (10.8726, 8.8679, 6.0613) = x⁽¹⁴⁾. At this stage there are already five significant figures in the answer.

Jacobi Iterative Method

In Example 1, if the diagonal elements in the matrix A were not all equal to 1, then each equation could be divided by its nonzero diagonal entry a_ii before applying the Richardson iterative method. In other words, the linear system could be scaled to become

where D = Diag(A). Here Diag(A) means the diagonal matrix whose diagonal equals the diagonal of A. Thenlet

The resulting procedure is called the Jacobi iterative method.

We know that if the iterates x^(k) converge (say to x^*), then x^* will be a solution of the system. What remains is to prove the convergence of the process, under suitable hypotheses on the matrix A. This will be accomplished in Theorem 4 and its corollaries.

EXAMPLE 2

In Example 1, change all the diagonal entries from 1 to 2:

Now apply the Jacobi iterative method. (The Richardson iterative method does not work well for this example!)

SOLUTION The Jacobi iteration formula is x^(k) = Bx^(k−1) + D⁻¹b, where B = I − D⁻¹A and D = Diag(A). The calculation can be described as follows:

This can be written in terms of the individual variables:

In either case, the approximate solution (4.1930, 3.2330, 2.0810) is obtained after nine iterations. A sample MATLAB program is given at the end of this section.

Gauss–Seidel Method

A powerful and easily programmed iterative procedure is the Gauss–Seidel⁷ Method. Let the system be, as usual,

Assume that all the diagonal elements in the matrix A are nonzero. One can write the system in the form

Once all the data have been entered, the Gauss–Seidel iteration uses this pseudocode:

The unknowns are updated one-by-one, using always the most recent values for each variable. The replacement step in the pseudocode is called the updating formula. The code is very easy to write, but the matrix form of the procedure is more complicated, requiring A tobesplitintothesumofa diagonal matrix, a lower triangular matrix, and an upper triangular matrix. Specifically, D is the diagonal matrix containing the diagonal elements of A, L is the lower triangular part of−D⁻¹A, and U is the upper triangular part of −D⁻¹A. Hence, we have D⁻¹A = I − L − U. Then the Gauss–Seidel method can be written as

where c = D⁻¹b.

⁷ Philip Ludwig von Seidel (1821–1896) submitted two completely different doctoral theses only 6 months apart—the first on astronomy and the second on mathematical analysis.

EXAMPLE 3

Repeat the task in Example 2 using the Gauss–Seidel iterative method.

SOLUTION Because the new values are to be used as soon as possible, we omit the y-variables. (We do not want to postpone updating the x-variables until the end of each loop, as is done in the Jacobi method.) The Gauss–Seidel method can be described in terms of the individual variables:

After only three iterations, an approximate solution (4.1930, 3.2330, 2.0810) emerges, and it is correct to five significant figures. To obtain full precision in the MATLAB program, it took 13 iterations. In this example, the Gauss– Seidel method required approximately one-half as many iterations as the Jacobi method to arrive at the same degree of precision. This relationship between the Jacobi and Gauss–Seidel methods turns out to be true for many examples.

Successive Overrelaxation (SOR) Method

In some cases, the Gauss–Seidel method can be accelerated by the introduction of a ‘‘relaxation factor’’ ω as in this formula:

Notice that when ω = 1 we recover the Gauss–Seidel method. When 1 < ω < 2, this procedure is called the successive overrelaxation (SOR) method. The optimal ω is often near 2. In his dissertation, Young [1950] first presented the formula (λ + ω − 1)² = λω²μ², sometimes called Young’s equation.⁸ It connects the eigenvalues μ of the Gauss-Seidel iteration matrix with the eigenvalues λ of the SOR iteration matrix (assuming certain conditions on the matrix A). This equation can be used to find the best relaxation factor , where ρ is the largest eigenvalue of the Jacobi method B in absolute value. The SOR method is one of the fundamental iterative methods in linear algebra and has been widely used in applications.

⁸ David M. Young, Jr., (1923–2008) established the mathematical foundation for the SOR method in his Ph.D. thesis at Harvard University. Ever since this groundbreaking research, iterative methods have been used on a wide range of scientific and engineering applications for solving large sparse systems of linear equations with many new iterative methods (called preconditioners) having been developed. Young was one of the pioneers in modern numerical analysis and scientific computing. His car license plate read Dr. SOR.

Conjugate Gradient Method

In theory, the conjugate gradient iterative method solves a system of n linear equations in at most n steps, if the matrix A is symmetric and positive definite. Moreover, the nth iterative vector x⁽ⁿ⁾ is the unique minimizer of the quadratic function . When the conjugate gradient method was introduced by Hestenes and Stiefel [1952], the initial interest in it waned once it was discovered that this finite-termination property was not obtained in practice. But two decades later, there was renewed interest in this method when it was viewed as an iterative process by Reid [1971] and others. In practice, the solution of a system of linear equations can often be found with satisfactory precision in a number of steps considerably less than the order of the system. For many very large and sparse linear systems, preconditioned conjugate gradient methods are now the iterative methods of choice! Here is a pseudocode for the conjugate gradient algorithm:

Here ε is a parameter used in the convergence criterion (such as ε = 10⁻⁵) and k_max is the maximum number of iterations. Usually, the number of iterations needed is much less than the size of the linear system. We save the previous value of γ in the variable γ_old. If a good guess for the solution vector x is known, then it should be used as an initial vector instead of zero. The variable ε is the desired convergence tolerance. The algorithm produces not only a sequence of vectors x⁽ⁱ⁾ that converges to the solution, but also an orthogonal sequence of residual vectors r⁽ⁱ⁾ = b−Ax⁽ⁱ⁾ and an A-orthogonal sequence of search direction vectors p⁽ⁱ⁾; namely, if i ≠ j and if i ≠ j.

The main computational features of the conjugate gradient algorithm are complicated to derive, but the final conclusion is that in each step only one matrix–vector multiplication is required and only a few dot-products are computed. These are extremely desirable attributes in solving large and sparse linear systems. Also, unlike Gaussian elimination, there is no fill-in, so only the nonzero entries in A need to be stored in the computer memory. For some partial differential equation problems, the equations in the linear system can be represented by stencils that describe the nonzero structure within the coefficient matrix. Sometimes these ‘‘stencils’’ are used in a computer program rather than storing the nonzero entries in the coefficient matrix.

EXAMPLE 4

Repeat the task in Example 2 using the conjugate gradient method.

SOLUTION Programming the pseudocode using MATLAB, we obtain the iterates x⁽¹⁾ = (4.3484, 3.1013, 1.8638), x⁽²⁾ = (4.1903, 3.2419, 2.0723), x⁽³⁾ = (4.1930, 3.2330, 2.0810). In only three iterations, we have the answer accurate to full machine precision, which illustrates the finite termination property. The matrix A is symmetric positive definite and the eigenvalues of A are 1.5887, 2.0911, 2.3202. This simple example may be a bit misleading because one cannot expect such rapid convergence in realistic applications. (The rate of convergence depends on various properties of the linear system.) In fact, the previous example is too small to illustrate the power of advanced iterative methods on very large and sparse systems.

Diagonally Dominant Matrices

Here we focus attention on some theorems that will identify cases in which the iterative methods will work well. (Look ahead to Theorem 4 and its corollaries.) The following concept will play a role.

DEFINITION

A square matrix is diagonally dominant if each diagonal element in absolute value is greater than the sum of the absolute values of the off-diagonal elements in that row. If the matrix in question is A and has generic elements a_ij, the diagonal dominance is expressed by

Such matrices are automatically invertible, as we now prove.

THEOREM 1

Every diagonally dominant matrix is invertible.

PROOF Let A be a diagonally dominant n × n matrix. We shall prove that Ax ≠ 0 if x ≠ 0. In other words, the null space of A contains only the zero vector. Thus, we let x be any nonzero vector, and set y = Ax. It is to be proved that y ≠ 0. Select i so that |x_i| is the largest component of x, in absolute value: |x_j| ≤ |x_i| ≠ 0 for all j. Then we have

Applying absolute values and our knowledge of |x_i|, we have

Finally, we obtain

Using the diagonal dominance property, we conclude that y_i ≠ 0 and y ≠ 0.

Gerschgorin’s Theorem

A useful consequence of the preceding theorem is Gerschgorin’s Theorem⁹ concerning the location of the eigenvalues in the complex plane for a given square matrix with real or complex elements. It says, roughly, that each eigenvalue may not be far from a diagonal element when the off-diagonal entries are small in norm!

THEOREM 2

Gerschgorin’s Theorem

If λ is an eigenvalue of an n × n matrix A = (a_ij), then, for some index i, we have

PROOF The matrix A − λI is noninvertible because λ is an eigenvalue of A. By Theorem 1, A − λI cannot be diagonally dominant. Thus, for some index i, we must have the inequality in the theorem.

The Gerschgorin row discs for an n × n matrix A = (a_ij) are given by

For each row i, a Gerschgorin row disc is a closed disc in the complex plane with center at a_ii (the diagonal element), and its radius r_i is the sum of the absolute values of the off-diagonal entries in the ith row. We let [a_ii; r_i] denote the ith Gerschgorin row disc with center a_ii and radius r_i.

Gerschgorin’s Theorem is useful in finding bounds on the spectrum, σ(A), of eigenvalues for a given matrix A.We want to localize the eigenvalues of A. In other words, we want to quickly find regions in the complex plane containing the eigenvalues, and to determine approximate values for them. Remember that the eigenvalues of a matrix may turn out to be complex numbers, even if the matrix is real.

⁹ Semyon Aranovich Gerschgorin (1901–1933) worked in the Leningrad Mechanical Engineering Institute. In 1931, he published a paper containing his now famous Circle Theorem. It was his only paper not in Russian, but in German. Unfortunately, he died at a rather young age before he could make many other major contributions. Corresponding to transliterations of the Yiddish, there are various spellings of the sir name of this mathematician from Belarus.

Even when a matrix is not diagonally dominant, Gerschgorin’s Theorem can be useful in determining whether a matrix is invertible—when zero is not an eigenvalue. Also, if the region containing the eigenvalues lies entirely to the left or right of the imaginary axis, the theorem provides information about the sign of the real parts of all the eigenvalues. Gerschgorin’s Theorem has many uses in applications such as in control theory (stability/instability) and scientific computing (stability of difference schemes). When solving a linear system of equations with a large condition number, Gerschgorin’s Theorem can be useful in finding a preconditioning matrix.

There are several consequences of Gerschgorin’s Theorem.

COROLLARY 1

Every eigenvalue of A lies within at least one of the Gerschgorin row discs of A.

The eigenvalue of a matrix A and its transpose A^T are the same because they have the same characteristic equation Det(A − λI) = Det(A^T − λI). (Recall Theorem 3, from Section 4.2.) Consequently, we can apply Gerschgorin’s Theorem to both A and A^T. So we can obtain the Gerschgorin column discs as well as row discs. By doing so, we obtain pairs of discs with the same centers, but one of them may have a smaller radii!

For the matrix A, the Gerschgorin column discs are the closed discs

For each column i, a Gerschgorin column disc is a closed disc in the complex plane centered at a_ii with radii c_i, which is the sum of the absolute values of the off-diagonal elements in column i. We let [a_ii; c_i] denote the ith Gerschgorin column disc with center a_ii and radius c_i.

COROLLARY 2

The eigenvalues of A lie within the Gerschgorin column discs of A.

The special case of diagonal matrices is of interest.

COROLLARY 3

The Gerschgorin discs of A coincide with the eigenvalue spectrum if and only if A is a diagonal matrix.

Gerschgorin’s Theorem asserts that all of the eigenvalues of an n × n matrix A are contained within the union of the n row discs in the complex plane:

(8.1)

In other words, the eigenvalues are trapped within the union of the n row discs. Also, all of the eigenvalues are contained within the union of the n column discs:

(8.2)

Since the eigenvalues of A are contained within both the union of the row discs (1) and the union of the column discs (2), they must be in the intersection of these two regions

(8.3)

Thus, we have obtained another region containing all of the eigenvalues of the matrix A.

By Theorem 3, if a Gerschgorin disc is disjoint from the other Gerschgorin discs, then it contains precisely one of the eigenvalues of A. In addition, if the union of k Gerschgorin discs does not touch any of the other n − k discs, then there are exactly k eigenvalues (counting multiplicities) in the union of these k discs.

COROLLARY 4

For the matrix A, if the union of k discs is disjoint from the union of the other n − k discs, then the former union contains exactly k eigenvalues of A and the latter union contains exactly n − k eigenvalues of A.

EXAMPLE 5

Use Gerschgorin’s Theorem to localize the eigenvalues of

SOLUTION The first Gerschgorin row disc [2; 5] has center 2 and radius 5. The second row disc [4; 3] has center 4 and radius 3. The first Gerschgorin column disc [2; 3] has center 2 and radius 3.

FIGURE 8.4 Example 5 Gerschgorin discs.

The second column disc [4; 5] has center 4 and radius 5. Figure 8.4 shows the row discs with solid circles, and the column discs with dashed circles. Using (1), we take the union of the two row discs ∪ . Since the second row disc is contained within the first row disc, this union is the larger one . Using (2), we take the union of these two column discs ∪ . Since the first column disc is contained within the second column disc, the union of these two discs is the larger one . Applying (3), we find ∩ , which is the shaded oval region as shown in Figure 8.4. We can check these results using the characteristic equation Det . Hence, the eigenvalues are . Clearly, the shaded oval-shaped region contains this pair of the complex conjugate eigenvalues.

EXAMPLE 6

Use Gerschgorin’s Theorem to localize the eigenvalues of

FIGURE 8.5 Example 6 Gerschgorin discs.

SOLUTION The Gerschgorin row discs are [2; 2], [4; 3], and [11; 4]. The Gerschgorin column discs are [2; 3], [4; 4], and [11; 2]. Figure 8.5 shows the row discs with solid circles and the column discs with dashed circles. Using (3), we find that the complex conjugate pair of eigenvalues are in the left shaded region ( ∪ ) ∪ ( ∩ ), and a single real eigenvalue is in the disjoint shaded region , on the righthand side. A computer program confirms that the eigenvalues are 3.3314 ± 0.1626i and 10.3371.

EXAMPLE 7

Use Gerschgorin’s Theorem to localize the eigenvalues of

SOLUTION The Gerschgorin row discs are , and [10 − 5i; 6]. The Gerschgorin column discs are , , [14+9i; 6], and [10−5i; 4]. Figure 8.6 (p. 538) shows the row discs with solid circles and the column discs with dashed circles. By (3), two of the eigenvalues are in the shaded region ∪ , and one eigenvalue is in the disjoint shaded region . The actual eigenvalues are 9.3133 + 0.4865i, 9.5902 − 4.6468i, and 14.0966 + 8.1604i.

FIGURE 8.6 Example 7 Gerschgorin discs.

EXAMPLE 8

Use Gerschgorin’s Theorem to localize the eigenvalues of

SOLUTION The Gerschgorin row discs are [5; 3], [11; 4], and [19; 2]. The Gerschgorin column discs are [5; 2], [11; 3], and [19; 4]. Figure 8.7 (p. 539) shows the row discs with solid circles and the column discs with dashed circles. Clearly, all of the eigenvalues are in the disjoint discs , , and . So each disc contains a single eigenvalue. By examining the characteristic polynomial, we can determine that the eigenvalues of the matrix are real. Consequently, we know that these eigenvalues are in the intervals [3,7], [8,14], and [17,21]. The actual eigenvalues are 5.3488, 11.1653, and 18.4857.

FIGURE 8.7 Example 8 Gerschgorin discs.

Interest in this topic began with the seminal paper by Gerschgorin in 1931, and the sharpening of his Circle Theorem via irreducibility by Taussky-Todd in 1949. It continues to fascinate researchers because of the beautiful results, and their ease of use in applications. Consequently, there is an ever-growing body of new developments. See, for example, Brualdi and Mellendorf [1994] and Varga [2004]. For additional details, see Faddeev and Faddeeva [1963, pp. 114–115], Golub and van Loan [1985], Horn and Johnson [1985, 343–364], Meyers [2000], and Wilkinson [1965].

Infinity Norm

Although there are many matrix norms, we will use the one that is easiest to compute.

DEFINITION

The infinity norm of an m × n matrix A is defined to be the quantity

A special case of this definition occurs when the matrix is m × 1, in other words, a column vector, x. In this case, the preceding formula still applies and is easy to calculate:

EXAMPLE 9

Compute the infinity norm of a vector and a matrix.

SOLUTION If x = (3, 7, −2), then x_∞ = 7. If , then A_∞ is the larger of the two sums 2 + 7 = 9 and 3 + 4 = 7. Hence, we obtain A_∞ = 9.

Lemma 1

If A is an m × n matrix and x is a column vector having n components, then

PROOF We have

Lemma 2

If A is a square matrix such that A_∞ < 1, then I − A is diagonally dominant.

PROOF The hypothesis A_∞ < 1 leads to for each i. It follows that

This inequality displays the criterion for diagonal dominance in the matrix I − A.

Convergence Properties

The numerical analysis literature is a source of many theorems establishing convergence properties of iterative methods. Here is one basic theorem from this far-ranging subject.

THEOREM 3

If G is an n × n matrix such that G_∞ < 1, then the iteration formula

produces a sequence of vectors x^(k)that converges to (I − G)⁻¹c from any starting point, x⁽⁰⁾.

PROOF The matrix I − G is invertible by Lemma 2 and Theorem 1. Define x = (I − G)⁻¹ c. Then (I − G)x = c and x = Gx + c. Consequently, we have

Hence by Lemma 1, we have

By repeating this argument k times, we arrive at

The righthand side of this inequality converges to 0 because G_∞ < 1.

COROLLARY 5

If I − A_∞ < 1, then defined by

produces a sequence that converges to the solution of the equation Ax = b.

PROOF Define G = I − A, and use Theorem 3. We see that G_∞ < 1. By Theorem 3, the iterates converge to (I − (I − A))⁻¹ b = A⁻¹ b.

COROLLARY 6

If A is diagonally dominant, then the Jacobi iteration

produces a sequence that converges to the solution of the equation Ax = b. The definitions are

PROOF In the Jacobi method, we have G = B = I−D⁻¹A and c = D⁻¹b, where D is the diagonal of A. In order to use Theorem 3, we need to know that G_∞ < 1. we have

COROLLARY 7

If the coefficient matrix in a system of equations is diagonally dominant, then the Gauss–Seidel iteration converges for any starting vector.

The proof requires a more complicated spectral radius argument, and the fact that the Gauss–Seidel iterative method converges for symmetric positive definite matrices. This type of matrix arises in the discretization of elliptic partial differential equations. Informally, we say that the more diagonally dominant the matrix is, the more rapid will be the convergence of the Jacobi and Gauss–Seidel methods. However, there can be exceptions. See Golub and van Loan [1996].

Power Method for Computing Eigenvalues

The problem of computing the eigenvalues of a matrix is at least as hard as finding the roots of a given polynomial. This problem inevitably brings in approximate methods, as there are no purely algebraic formulas for the roots of general polynomials of degree five or higher. (Abel–Ruffini Impossibility Theorem, see Appendix B.) Using higher transcendental functions to compute roots of a polynomial is possible but not a competitive strategy!

To see how iterative methods can produce approximate eigenvalues of a matrix A, we make some simplifying assumptions. Suppose that A is an n × n matrix whose eigenvalues satisfy these inequalities:

Assume, further, that there is a complete set of eigenvectors:

We mean by this that these vectors form a basis for . For the iteration, select any initial vector x⁽⁰⁾ that has a component in the direction of u⁽¹⁾. Thus, for suitable coefficients,

and we assume that c₁ ≠ 0. Because these coefficients c_i can be absorbed into the eigenvectors u⁽ⁱ⁾, we can write simply

To get the next vector in the sequence, we multiply both sides of this equation by the matrix A, and make use of the fact that these vectors u^(j) are eigenvectors. The result is then

This process is repeated, and the general case is

By simple algebra, this becomes

The terms (λ_j/λ₁)^k converge to zero because we have assumed that |λ₁| > |λ_j| for j >1. Thus, we can write

where the term e_k accounts for all the terms that converge to 0. Select a vector v and use dot-products so that we will be dealing with numbers instead of vectors:

Forming ratios brings us to

Because λ₁ is the largest eigenvalue in absolute value, all the terms except the first inside the brackets converge to 0.

In carrying out such a calculation, it is advisable to normalize the vectors x^(k) at each step. Otherwise the sequence may converge to zero or infinity. Here is a segment of pseudocode for the power method:

Here we could use the vector v = e₁ = (1, 0) as in the following example.

EXAMPLE 10 Find one of the eigenvalues of the following matrix, using the power method.

SOLUTION We repeatedly execute the steps in this pseudocode:

In each iteration, new values for the vectors x and y are computed. At the same time, the ratio y₁/x₁ converges to the dominant eigenvalue of A. It starts with r = 9 and then r = 3.222 and does not settle down to −8.5730 until after 47 iterations. A computer program is needed to accomplish this and is given at the end of this section.

Application: Demographic Problems, Population Migration

Linear algebra is one of the principal tools in demography. For example, it can be used in predicting how the distribution of the population will change yearly within a certain region of a country. The broad strategy is to study the past in order to discover a mathematical model, and then to use the model in predicting the future or in controlling the future by appropriate public policy.

Let us say that the region in question has a major city at its hub,and this city is surrounded by a suburban area. Beyond these two areas there is an outer region with population loosely associated culturally and economically with the city and its suburbs. We call it the rural area. Because of shifts in population (migration), the numbers of people in each of the three regions change over time. By taking surveys or consulting census records, one knows the population in the three areas at some instant in time, and one knows the migration patterns. Here we are assuming a closed system in which no one enters or leaves this group of three regions!

To make a concrete numerical case, we assume that the initial populations of these three areas are 3 million in the hub (core area), 4 million in the suburbs, and 6 million in the rural region. These regions are disjoint and the total population of 13 million remains constant, but there are annual movements of people from one region to another.

Suppose that in any year, 20% of the people in the hub move to the suburbs and 10% of the people in the hub move to the rural region. The remaining 70% stay where they are in the hub.

Further, suppose that in any year, 12% of the people in the suburban area move to the hub, and 8% of the people in the suburban area move to the rural region. The remaining 80% stay in the suburban area.

Finally, suppose that 30% of the population in the rural region moves to the suburban area, and 20% of the population in the rural region moves to the hub. The remaining 50% of the people in the rural region stay where they are.

At a given instant, the population distribution can be represented by avectorx having three components: h (for hub), s (for suburban area), and r (for outer rural area). Thus, we let x =(h; s; r). From the preceding information, these details emerge about the change of h over a one-year period:

1. 70% of the people in the hub remain there.

2. 12% of the people in the suburban area move to the hub.

3. 20% of the population in the rural region moves to the hub.

Similar results for s and r can be deduced. From the data, for each component in our vector we have

The initial condition of the model is described by the vector x⁽⁰⁾ = (3, 4, 6), where the units are 1 million people. The yearly migration can be described by a matrix

Notice that row one in the matrix A pertains to the hub. It retains 70% of its population, and gains 12% of the population from the suburbs, as well as 20% of the people from the outer rural region.

On the diagonal of the matrix the numbers are relatively strong, reflecting the fact that most people do not move during any one year. Also note that the columns sum to 1, because column one accounts for the migration of all the people in the hub, and the other columns can be interpreted similarly.

Put in the starting values of the vector x⁽⁰⁾ =(3, 4, 6) and compute

This shows the population of the three sectors after one year. In order to see what happens over a number of years, mathematical software can be put to use, computing x⁽⁰⁾, x⁽¹⁾, x⁽²⁾, and so on. The vector x⁽⁰⁾ is the initial distribution of population, and the preceding computation gives

The dynamical system is described recursively by the equation x^(k) = Ax^(k−1). In a few years, the populations of the three sectors reach equilibrium. For example, we obtain x⁽⁸⁾ = (4.093, 6.969, 1.939), and eight years later we have x⁽¹⁶⁾ = (4.083, 6.983, 1.934).

In such a dynamical system, there is also the capability of reversing time to look into the past. For this, one can use the formulas y⁽⁰⁾ = (3, 4, 6) and y^(k+1) = A⁻¹Y^(k).

Population models such as this simple one have obvious weaknesses. For example, the total population of the three regions has been assumed to be constant: there are no deaths, no births, and no migration into or out of the region. However, more elaborate models can grow from simple ones, and modifications can be incorporated to improve the predictions.

The behavior of a dynamical system can be analyzed in great detail by using the eigenvalues, eigenvectors, and diagonalized form of the matrix that describes the system. This procedure was explained in Section 6.1, and here we show how it applies to the migration problem discussed previously. In particular, the long-term behavior of the dynamical system can be understood once we have the eigenvalues in hand. Recall that

where the eigenvectors are the columns of P = [p_1, p_2, p₃]. Also, we have

since λ₁ = 1, |λ₂| < 1, and |λ₃| < 1.

Let mathematical software do the work in the preceding example to obtain the eigenvalues and eigenvectors of the matrix A. The eigenvectors are columns of P, each having been normalized.

Now recall the equation

which is the diagonalization of A. The eigenvalues of A are in the diagonal matrix D. The eigenvectors (normalized) are the columns of P. The purpose of this diagonalization is to make available the simple formula

Because two of the eigenvalues are less than 1 in absolute value,

Therefore, we obtain

This calculation confirms that the vectors x^(k) are settling down to a constant vector.

Application: Leontief Open Model

An economic model that is more elaborate than the one presented in Section 6.1 takes into account the existence of sectors in our economy that consume but do not produce output to be sold. Consumer activity can be interpreted as one such sector. The matrix problem that arises from this model is

where A and b are known numerically. Again the question is whether there is a solution, and, of course, the answer depends upon what we wish to assume about A and b. In this model, x is a vector whose components are the production levels of the various industries. It should be a nonnegative vector; that is, x_i ≥ 0 for all i. Likewise, the matrix A, now having to do with consumption of the various commodities, should be nonnegative. We write x ≥ 0 and A ≥ 0 to express these requirements, and refer to nonnegative vectors and nonnegative matrices.

THEOREM 4

Let A be a nonnegative n × n matrix and let b be a nonnegative vector in . If all column sums of A are less than 1, then the system

has a unique solution, x, and x is nonnegative.

PROOF Put E = (I − A)^T. We will show that E is diagonally dominant. Then E will be invertible by Theorem 1. From that, it will follow that E^T is invertible and I − A is invertible. Then the system in question has the (unique) solution x = (I − A)⁻¹b. The nonnegativity is not proven here, but can be found in more advanced books. To prove the diagonal dominance of E, we write

Consequently, E is diagonally dominant.

The economic model described here is known as the Leontief Open Model of an economy having n sectors.The vector x is the production vector. It gives the amounts of each product produced in one year. The open sector has no production, since it only consumes. The vector b is the final demand vector. It is the consumer demand and lists the values of goods and services consumed by the open sector. The model leads to the vector x that satisfies the equation, and this in turn reveals the correct level of production in each sector.

The correct x-vector shows the level of production for each industry that is needed to satisfy consumer demand plus the needs of the various industries. These are related by the matrix–vector equation

The vector Ax gives the amounts of each product consumed in the manufacturing phase. It represents the resources needed in production, which economists call the intermediate demand. The product Ax accounts for the consumption required just to reach the level of output x. In the matrix A, each entry a_ij is the output of industry i needed by industry j. Each column in the matrix A gives the inputs needed to produce one unit. This column is the unit consumption vector. The matrix A has special properties: its elements are nonnegative (a_ij ≥ 0), and each column sum is less than . The column vector of Ax is

Here x₁a₁₁ is the amount of the first product required in manufacturing process number one. In the same way, x₁a₂₁ is the amount of product number two required in manufacturing process number two, and so on. In general, we have

which is the total intermediate demand for the ith product. We can write the total production as the sum of the intermediate demand and the consumer demand (or final demand), thus arriving at

This is the same as

Recall that the inequality A ≥ 0 means that all entries in the matrix A are nonnegative. Theorem 5 asserts that if A ≥ 0 and b ≥ 0 and for each j, then (I − A)⁻¹ exists and x = (I − A)⁻¹b and x ≥ 0.

Notice that

In general, we have

If A^m+1 converges to 0 as m goes to infinity, then we obtain the Neumann Series:

Here we are glossing over some technical matters that belong to the study of functional analysis.

EXAMPLE 11

Suppose we have a table of values

Suppose the final demand vector is [12, 18, 15]^T. Find the required production levels.

SOLUTION This table corresponds to the consumption matrix A. In this example, to produce a unit of manufactured goods requires 0.1 units of agriculture (Agr.) goods, 0.3 units of manufacture (Mfg.) goods, and 0.5 units of services (Ser.) goods.

We have

Notice that the elements of the matrix A are nonnegative and the column sums are less than 1. The system (I − A)x = b must be solved for x. The augmented matrix is

The production levels turn out to be 115 units of agriculture goods, 159 units of manufacturing goods, and 176 units of service goods.

Mathematical Software

Here is a short MATLAB program pertaining to Example 1. It will carry out 50 steps of the Richardson iteration once the values of G, b, and x⁽⁰⁾ have been given to the program as input.


MATLAB


G = [0.0,0.3,0.2;0.3,0.0,0.1;0.2,0.1,0.0];
b = [7.0; 5.0; 3.0];
x = [0.0,0.0,0.0];
z = [0,x′]
for k=1 : 50
  x = G*x + b;
  z = [k,x′]
end

In the MATLAB code, the z-array is used only for facilitating the display of the results. By using the command format long, one obtains a display of all available digits in the calculations.This means about 15 digits of precision. By examining all 15 digits in the computed vectors, one finds that the solution vector has been computed to full MATLAB precision by the 42nd step. In other words, further iteration will not change the result (except for rounding errors), if one examines only the first 15 digits. A check on the work consists in computing Ax. This leads to the vector b, accurate to 15 decimal places.

Sample Maple and Mathematica codes for Example 1 are presented here:


Maple


with (LinearAlgebra) :
G := Matrix ([0.0,0.3,0.2],[0.3,0.0,0.1], [0.2,0.1,0.0]);
b := Vector (7.0,5.0,3.0);
x := Vector (0.0,0.0,0.0)
for k from 1 to 50 do
  x := G.x + b
od


Mathematica


G = {{0.0,0.3,0.2},{0.3,0.0,0.1},{0.2,0.1,0.0}};
b = {7.0,5.0,3.0}
x = {0.0,0.0,0.0}}
For [k=1, k<<=50, k++
  x = G.x + b
]

In the remainder of this chapter, only MATLAB codes are given because it is straightforward to write similar codes in Maple and Mathematica.

For Example 2, the MATLAB code could be written as follows:


MATLAB


B = [0.0,0.15,0.1;0.15,0.0,0.05;0.1,0.05,0.0]
c = [3.5;2.5;1.5]
x = [0.0;0.0;0.0]
z = [0,x']
for k=1:25
  x = B*x + c;
  z = [k,x']
end

This code could be written as follows in terms of the individual variables:


MATLAB


x1 = 0.0; x2 = 0.0; x3 = 0.0;
z = [0, x1, x2, x3]
for k=1:25
  y1 = 0.15*x2 + 0.1*x3 + 3.5;
  y2 = 0.15*x1 + 0.05*x3 + 2.5;
  y3 = 0.1*x1 - 0.05*x2 + 1.5;
  x1 = y1; x2 = y2; x3 = y3;
  z = [k, x1, x2, x3]
end

To obtain full machine precision, it took 24 iterations.

In Example 3, the MATLAB code for the Gauss–Seidel method is especially simple. It can be written as follows:


MATLAB


x1 = 0.0; x2 = 0.0; x3 = 0.0;
z = [0, x1, x2, x3]
for k=1:25
  x1 = 0.15*x2 + 0.1*x3 + 3.5;
  x2 = 0.15*x1 + 0.05*x3 + 2.5;
  x3 = 0.1*x1 + 0.05*x2 + 1.5;
  z = [k, x1, x2, x3]
end

In Example 10, a MATLAB program can be written to carry out this calculation:


MATLAB


A = [1 3 5; 7 -8 1; 2 4 -2]
x = [1; 1; 1]
for k=1:50
  y = A*x;
  r = y(1)/x(1);
  x = y/norm(y);
  z = [k,x']
end

This program generates 50 new values for the vectors x and y. At the same time, the ratio y(1)/x(1) is being recorded. This is the quantity that should converge to the dominant eigenvalue of A. The output from the program starts with r = 9 and r = 3.222. By the 47th step, however, the value of r has settled down to −8.5730.

Of course, MATLAB has commands for obtaining the eigenvalues immediately. For this exercise, we input the matrix A and then use the command eig(A). The result of doing this is the set of three values: 5.5054, −8.5730, −5.9324. These eigenvalues are available in 15-decimal precision. For example, the value we were seeking is c = −8.57299818380420.

Is there still another check that can be applied to guarantee the precision of the eigenvalue? Yes, we can compute the determinant of A − cI, where c is the 15-digit value quoted previously. The MATLAB command to do this is det(A − c * eye(3)), and the result is −9.474 × 10⁻¹⁴.

SUMMARY 8.3

• Richardson Iterative Method:

x^(k) = (I − A)x^(k−1) + b

• Scaled System: D⁻¹Ax = D⁻¹b, where D⁻¹A = I−B = I−L−U. Here D = Diag(A), L = lower triangular part of D⁻¹A, U = upper triangular part of D⁻¹A, and c = D⁻¹b. These formulas define B.

• Jacobi Iterative Method: x^(k) = Bx^(k−1) + c, where c and B are as noted previously.

• Gauss–SeidelMethod: (I−L)x^(k) = Ux^(k−1)+c

• Successive Overrelaxation (SOR) Method: (I − ωL)x^(k) = {Ux^(k−1) + c} +(1 − ω)x^(k) The relaxation factor ω is between 0 and 2.

• Conjugate Gradient Method: If A is an n × n matrix, then the nth vector in the algorithm, x⁽ⁿ⁾, is the unique minimizer of the quadratic function q(x) = ½x^T Ax − x^Tb as well as the solution of Ax = b, when A is symmetric positive definite.

•A diagonally dominant n × n matrix satisfies for all i.

• Infinity norm of an m × n matrix: where A is m × n.

• The infinity norm of a vector is a special case, where m = n, ||x||_∞ = max_1≤i≤n |x_i|. Here x is in .

• Every diagonally dominant matrix has an inverse.

• Gerschgorin Theorem: If λ is an eigenvalue of A, then for some index i.

• Nonoverlapping Gerschgorin discs contain exactly one eigenvalue each.

• ||Ax||_∞ ≤ ||A||_∞ · ||x||_∞

• If ||G||_∞ < 1, then the iteration x^(k) = Gx^(k−1) + c produces a sequence x^(k) that converges to (I − G)⁻¹c from any starting point.

• The Richardson iteration converges to the solution of Ax = b, when ||I − A||_∞ < 1.

• The Jacobi iteration converges to the solution of Ax = b, when A is diagonally dominant.

• The Gauss–Seidel iteration converges to the solution of Ax = b, when A is diagonally dominant.

• If A ≥ 0, b ≥ 0, and for each j, then (I − A)⁻¹ exists and the vector x = (I − A)⁻¹b solves the equation x = Ax + b and x ≥ 0.

•Power Method: One can select a vector x⁽⁰⁾ and carry out the process x^(k+1) = Ax^(k)/||Ax^(k)||_∞ under favorable circumstances. Here, k runs through the sequence 1, 2, 3, …. The vectors x^(k) will converge to an eigenvector of A.

• Leontief Open Model: x = Ax + b

• Neumann Series: If ||A||_∞ < 1, then (I − A)⁻¹ = I + A + A² + …. There is a limiting process involved here, as indicated by the three dots, but the meaning and justification are outside the scope of this book.

KEY CONCEPTS 8.3

Iterative algorithms: Richardson method, Jacobi method, Gauss–Seidel method, successive overrelaxation (SOR) method, conjugate gradient methods, diagonally dominant matrices, Gerschgorin’s Theorem, Gerschgorin discs, infinity matrix norm, nonnegative systems, computing eigenvalues using an iterative method, applications (demographic problems, population migrations, and economic models)

GENERAL EXERCISES 8.3

1. Show that the eigenvalues of an n × n matrix A lie in the union of n discs in the complex plane described as follows:

Notice that the sum in the formula uses the elements in the ith column of A, not the ith row.

2. Explain why it is true or find a counterexample: For n × n matrices, we have ||AB||_∞ ≤ ||A||_∞ · ||B||_∞.

3. Find a small interval on the real line that contains the eigenvalues of this matrix:

Do not do any work other than using Gerschgorin’s Theorem. Take advantage of the symmetry of the matrix A.

4. Use Gerschgorin’s Theorem to deduce that a diagonally dominant matrix cannot have 0 as an eigenvalue, and thus conclude that every diagonally dominant matrix is invertible. (Do not use Theorem 1.)

5. Confirm that the function A ||A||_∞ acting on the linear space of all n × n matrices has these properties:

a. ||A||_∞ > 0 if A ≠ 0

b. ||αA||_∞ = |α|||A||_∞

c. ||A + B||_∞ ≤ ||A||_∞ + ||B||_∞

6. Show that for two n × 1 matrices, x and y, we have |x^Ty| ≤ n ||x||_∞ · ||y||_∞.

7. Establish that all eigenvalues of a matrix A lie in the disc of radius ||A||_∞ centered at 0 in the complex plane.

8. Affirm that if ||A||_∞ < c, then cI ± A is invertible.

9. Solve the Leontief Open Model problem using these data

10. Establish that for any matrix A there is a vector x such that ||x||_∞ = 1 and ||Ax||_∞ = ||A||_∞.

11. In the proof of Lemma 2, we used the inequality 1 − |a| ≤ |1 − a|. Establish this. Is it true for complex numbers? Is it true if we generalize the inequality to b − |a| ≤ |b − a|?

12. What is the largest value that ||Ax||_∞ can have if A is given and fixed, while x is constrained only by ||x||_∞ ≤ 1?

13. Consider the matrix

Apply Gerschgorin’s Theorem and show the three discs in the complex plane whose union contains the eigenvalues. Note that these discs are disjoint from one another, and each disc contains an eigenvalue. This illustrates the refinement of Gerschgorin’s Theorem described in Theorem 3.

14. Let A be a real, n × n matrix. Show that if α + iβ is an eigenvalue of A (α and β being real), then we have

15. Establish that (I − B^k)(I − B)⁻¹ = I + B + B² + ··· + B^k−1 for k ≥ 1.

16. Let have the block form in which the submatrices are n × n. Establish that if (B − I) is invertible, then

b. Repeat for .

17. Let A be an n × n matrix. Define , , and r_i = min{p_i, q_i}. Show that each eigenvalue of A is in one of the discs

18. Let

Use the power method and x⁽⁰⁾ = (1, 0) to find an eigenvalue of A.

19. Let

Find conditions on s so that A is diagonally dominant.

20. Let

From the Gerschgorin discs, what can you conclude about the eigenvalues of B? Sketch these discs and compare to the true eigenvalues.

21. Let

This matrix is not diagonally dominant, but it is invertible! What does this say with regard to Theorem 1?

22. If either a matrix or its transpose is diagonally dominant, then both of them are invertible. Explain.

23. A nonzero diagonal matrix is diagonally dominant and invertible. Explain.

24. The Gerschgorin discs coincide with the eigenvalue spectrum if and only if the matrix is diagonal. Explain.

25. Let

Find conditions on t so that A is diagonally dominant.

26. Let

H is not diagonally dominant, but it is invertible. Explain.

27. Let

What can you say about the eigenvalues of A?

28. Let

When the power method is applied to the matrix G, the result is a sequence of vectors that settle down to a vector of the form [r, 1]^T where |r| < 1. Find approximate eigenvalue–eigenvector pairs of G.

29. Let

In the Leontief Open Method, we have (I − A)x = b. Find a numerical value for x.

30. (Continuation.) Repeat the previous exercise with

31. If A = (a_ij) is a real n × n diagonally dominant matrix with all the diagonal elements a_ii being positive/negative, then the real part of its eigenvalues are positive/negative. Explain why.

32. Sometimes the definition of a diagonally dominant matrix is called strict diagonal dominant because the strict inequality > is used, and is called weak diagonal dominant when the inequality ≥ is used. Examine these matrices for being strict or weak diagonally dominant. Are they invertible?

33. Let

Perform two steps of the conjugate gradient method starting with x⁽⁰⁾ = 0. Compare the results to the true solution.

34. Let

Perform two steps of the conjugate gradient method starting with x⁽⁰⁾ = (0, 0, 0). Compare the results to the true solution.

35. Establish that for a square matrix A, zero is not an eigenvalue if and only if A is invertible.

36. Use Gerschgorin’s Theorem to provide an alternative proof for Theorem 1.

37. Find ||x||_∞ when x = (1, −3, 0, −2).

38. Determine ||A||_∞, when

39. Using the previous two exercises, verify that ||Ax||_∞ ≤ ||A||_∞ · ||x||_∞.

40. The set of all eigenvalues of a square matrix A is the eigenvalue spectrum and is denoted σ(A). Explain why a cheap, but rather crude, upper bound on all eigenvalues λ ∈ σ(A) is:

Apply to A in General Exercise 38.

41. (Continuation.) Explain why the spectral radius of matrix A is σ(A) = max_λ∈σ(A) |λ|. Then

Apply to A in General Exercise 38.

42. Let

Find bounds on the eigenvalues of A in the following ways.

a. First, find a crude estimate using General Exercise 40.

b. Second, an estimate using the union of the Gerschgorin row discs.

c. Then, an estimate using the union of the Gerschgorin column discs.

d. Finally, use the intersection of the union of the row discs and the union of the column discs.

e. Illustrate with sketches of the Gerschgorin discs.

f. Compare to the true eigenvalues.

43. (Continuation.) Repeat for this matrix:

44. Explain why the Successive Overrelaxation method can be considered a generalization of the Gauss-Seidel method.

45. Let

Show that I − A is diagonally dominant if ||A||_∞ < 1.

46. Does Lemma 2 hold for the identity matrix?

47. Combine Lemma 2 and Theorem 1 to produce another result. Explain.

48. Explain how to obtain the optimum relaxation parameter ω_b from Young’s equation.

49. Under what conditions on the matrix A does it follow that A is diagonally dominant if and only if A^T is diagonally dominant?

50. Using Gerschgorin discs, find bounds on the eigenvalues of these matrices.

COMPUTER EXERCISES 8.3

1. Write code to solve Example 2 using the following iterative methods:

a. Jacobi

b. Gauss–Seidel

c. SOR with ω_b

d. Conjugate gradient

2. Let

Use mathematical software to compute (I − A)⁻¹ and verify the solution in Example 11. Compute A², A³, A¹⁰, and A²⁰. For increasing values of m, does A^m → 0?

3. (Continuation.) Show that so that the exact solution (I − A)⁻¹b with agrees with the approximate solution given in the examples.

4. Program the Jacobi iterative method to solve the problem Ax = b, where and

Draw a conclusion from the output of your4 program. Then run the same program using as the initial vector [2, −2.5, −2.5] and with at least 200 steps. Explain what is happening.

5. (Continuation.) Change the matrix to and use the Jacobi method again. (Probably 40 or more steps will be needed to get a reasonable approximate solution.)

6. Let

Write mathematical software to solve it using the following iterative methods:

a. Jacobi

b. Gauss–Seidel

c. SOR with ω_b

d. Conjugate gradient

7. Consider

Starting with x⁽⁰⁾ = (0, 0, 0), calculate x⁽⁴⁾ by the Jacobi method and x⁽⁴⁾ by the Gauss– Seidel method.

8. Consider the augmented matrix

Starting with x⁽⁰⁾ = (2, 3), calculate x⁽⁴⁾ by the Jacobi method and x⁽²⁾ by the Conjugate gradient method.

9. Verify calculations and assertions in Examples 5, 6, 7, and 8. Give details and justifications.

10. Consider the model elliptic partial differential equation known as the Poisson equation ∂²u(x, y) = ∂x²+ ∂²u(x, y) = ∂y² = f(x, y) on the unit square 0 < x, y < 1 with boundary conditions u(x, y) = g(x, y). Let f(x, y) = 9(x² + y²) and g(x, y) = x − y. With a uniform mesh subdivision of the square region, the finite difference approximations at mesh point (x_i, y_j) may be written u_i,j − u_{i + 1, j} − u_i−1,j − u_{i,j + 1} − u_i,j−1 = −h²f(x_i, y_j). With spacing , we obtain nine unknowns and a 9 × 9 system of linear equations. In Figure 8.8, we indicate the ordering for the unknown vector u by first numbering the mesh points of the problem solution region in two different ways. Let the kth component u_k of the vector u be the unknown corresponding to the mesh point marked k. Consider the system Au = b, where

This is the natural ordering with the unknown points u₁, u₂, u₃, u₄, u₅, u₆, u₇, u₈, u₉ in the lexicographical order (left-to-right and bottom-to-top). Write mathematical software to solve it using the following iterative methods:

a. Jacobi

b. Gauss–Seidel

c. Conjugate gradient

d. SOR with ω_b

FIGURE 8.8

11. (Continuation.) Reorder the equations and unknowns in the system Au = b to have this form,

This is the red-black ordering with the red unknown points u₁, u₂, u₃, u₄, u₅ and the black unknowns u₆, u₇, u₈, u₉.

12. Let

Using the initial condition x⁽⁰⁾ = (1, 1, 1), write the pseudocode for these iterative methods. Write a computer program that carries out a number of iterations. Compare the results to the true solution.

a. Jacobi

b. Gauss–Seidel

13. Write the pseudocode for approximating the eigenvalue of largest magnitude of these matrices by using the power method with the starting vectors given. Write a computer program that carries out a number of iterations. Compare the results to the true eigenvalues.

14. Let

Using the initial condition x⁽⁰⁾ = (0, 0, 0), write the pseudocode for these iterative methods. Write a computer program that carries out a number of iterations. Compare the results to the true solution.

a. Jacobi

b. Gauss–Seidel

15. Let

Write pseudocode for the SOR iterative1 methods with ω = 1.7 and starting with x⁽⁰⁾ = (0, 0, 0). Carry out a number of iterations and compare the approximate solution to the true solution.

16. Consider

Write the pseudocode and carry out a number of iterations of the following iterative methods starting with x⁽⁰⁾ = (0.8, −0.6). Compare the approximate solution to the true solution.

a. Jacobi method

b. Gauss–Seidel method

17. Carry out several steps of the power method and compare to the true results.

18. Write and test a computer program that carries out the following scheme proposed by Brualdi and Mellendorf [1994] to make Gerschogrin’s Theorem stronger.

For any n × n matrix A, let and let k be an integer with 1 ≤ k ≤ n and let be the sum of the magnitude of the r − 1 largest off-diagonal elements in column j. Then each eigenvalue of A is either in one of the discs

or in one of the regions

where P is any subset of{1, 2, …, n} such that |P| = k.

For a given real or complex matrix A, plot the resulting discs to identify a region in the complex plane that contains all of the eigenvalues of the matrix.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8.2 MATRIX FACTORIZATIONS AND BLOCK MATRICES

Create new playlist

Sign In

Sign Up

8.2 MATRIX FACTORIZATIONS AND BLOCK MATRICES

Introduction

Permutation Matrix

LU -Factorization

LLT-Factorization: Cholesky Factorization

LDLT-Factorization

QR-Factorization

Singular-Value Decomposition (SVD)

Schur Decomposition

Partitioned Matrices

Solving a System Having a 2 × 2 Block Matrix

Inverting a 2 × 2 Block Matrix

Application: Linear Least-Squares Problem

Mathematical Software

SUMMARY 8.2

KEY CONCEPTS 8.2

GENERAL EXERCISES 8.2

COMPUTER EXERCISES 8.2

8.3 ITERATIVE METHODS FOR LINEAR EQUATIONS

Introduction

Richardson Iterative Method

Jacobi Iterative Method

Gauss–Seidel Method

Successive Overrelaxation (SOR) Method

Conjugate Gradient Method

Diagonally Dominant Matrices

Gerschgorin’s Theorem

Infinity Norm

Convergence Properties

Power Method for Computing Eigenvalues

Application: Demographic Problems, Population Migration

Application: Leontief Open Model

Mathematical Software

SUMMARY 8.3

KEY CONCEPTS 8.3

GENERAL EXERCISES 8.3

COMPUTER EXERCISES 8.3

Table of Contents for
8.2 MATRIX FACTORIZATIONS AND BLOCK MATRICES

LL^T-Factorization: Cholesky Factorization

LDL^T-Factorization