6.10* Conditioning and the Rayleigh Quotient

In Section 3.4, we studied specific techniques that allow us to solve systems of linear equations in the form Ax=b, where A is an m×n matrix and b is an m×1 vector. Such systems often arise in applications to the real world. The coefficients in the system are frequently obtained from experimental data, and, in many cases, both m and n are so large that a computer must be used in the calculation of the solution. Thus two types of errors must be considered. First, experimental errors arise in the collection of data since no instruments can provide completely accurate measurements. Second, computers introduce roundoff errors. One might intuitively feel that small relative changes in the coefficients of the system cause small relative errors in the solution. A system that has this property is called well-conditioned; otherwise, the system is called ill-conditioned.

We now consider several examples of these types of errors, concentrating primarily on changes in b rather than on changes in the entries of A. In addition, we assume that A is a square, complex (or real), invertible matrix since this is the case most frequently encountered in applications.

Example 1

Consider the system

x1+x2=5x1x2=1.

The solution to this system is

(32).

Now suppose that we change the system somewhat and consider the new system

x1+x2=5x1x2=1.0001.

This modified system has the solution

(3.000051.99995).

We see that a change of 104 in one coefficient has caused a change of less than 104 in each coordinate of the new solution. More generally, the system

x1+x2=5x1x2=1+δ

has the solution

(3+δ/22δ/2).

Hence small changes in b introduce small changes in the solution. Of course, we are really interested in relative changes since a change in the solution of, say, 10, is considered large if the original solution is of the order 102, but small if the original solution is of the order 106.

We use the notation δb to denote the vector bb, where b is the vector in the original system and b is the vector in the modified system. Thus we have

δb=(51+h)(51)=(0h).

We now define the relative change in b to be the scalar ||δb||/||b||, where |||| denotes the standard norm on Cn (or Rn); that is, ||b||=b, b. Most of what follows, however, is true for any norm. Similar definitions hold for the relative change in x. In this example,

||δb||||b||=|h|26and||δx||||x||=(3+h/22h/2)(32)(32)=|h|26.

Thus the relative change in x equals, coincidentally, the relative change in b; so the system is well-conditioned.

Example 2

Consider the system

x1+x2=3x1+1.00001x2=3.00001,

which has

(21)

as its solution. The solution to the related system

x1+x2=3x1+1.00001x2=3.00001+δ

is

(2(105)h1+(105)h).

Hence,

||δx||||x||=1052/5|h|104|h|,

while

||δb||||b|||h|32.

Thus the relative change in x is at least 104 times the relative change in b! This system is very ill-conditioned. Observe that the lines defined by the two equations are nearly coincident. So a small change in either line could greatly alter the point of intersection, that is, the solution to the system.

To apply the full strength of the theory of self-adjoint matrices to the study of conditioning, we need the notion of the norm of a matrix. (See Exercises 26-30 of Section 6.1 for further results about norms.)

Definition.

Let A be a complex (or real) n×n matrix. Define the norm of A by

||A||E=maxx0||Ax||||x||,

where xCn or xRn.

Intuitively, ||A||E represents the maximum magnification of a vector by the matrix A. The question of whether or not this maximum exists, as well as the problem of how to compute it, can be answered by the use of the so-called Rayleigh quotient.

Definition.

Let B be an n×n self-adjoint matrix. The Rayleigh quotient for x0 is defined to be the scalar R(x)=Bx, x/||x||2.

The following result characterizes the extreme values of the Rayleigh quotient of a self-adjoint matrix.

Theorem 6.44.

For a self-adjoint matrix BMn×n(F), we have that maxx0R(x) is the largest eigenvalue of B and minx0 R(x) is the smallest eigenvalue of B.

Proof.

By Theorems 6.19 (p. 381) and 6.20 (p. 381), we may choose an orthonormal basis {v1, v2, , vn} of eigenvectors of B such that Bvi=λivi (1in), where λ1λ2λn. (Recall that by the lemma to Theorem 6.17, p. 370, the eigenvalues of B are real.) Now, for xFn, there exist scalars a1, a2, , an such that

x=i=1naivi.

Hence

R(x)=Bx, x||x||2=i=1naiλivi,j=1najvj||x||2=i=1nλi|ai|2||x||2λ1i=1n|ai|2||x||2=λ1||x||2||x||2=λ1.

It is easy to see that R(v1)=λ1, so we have demonstrated the first half of the theorem. The second half is proved similarly.

Corollary 1.

For any square matrix A, ||A||E is finite and, in fact, equals λ, where λ is the largest eigenvalue of A* A.

Proof.

Let B be the self-adjoint matrix A*A, and let λ be the largest eigenvalue of B. Since, for x0,

0||Ax||2||x||2=Ax, Ax||x||2=A*Ax, x||x||2=Bx, x||x||2=R(x),

it follows from Theorem 6.44 that ||A||E2=λ.

Observe that the proof of Corollary 1 shows that all the eigenvalues of A*A are nonnegative. For our next result, we need the following lemma.

Lemma. For any square matrix A, λ is an eigenvalue of A* A if and only if λ is an eigenvalue of AA*.

Proof.

Let λ be an eigenvalue of A*A. If λ=0, then A*A is not invertible. Hence A and A* are not invertible, so that λ is also an eigenvalue of AA* . The proof of the converse is similar.

Suppose now that λ0. Then there exists x0 such that A*Ax=λx. Apply A to both sides to obtain (AA*)(Ax)=λ(Ax). Since Ax0 (lest λx=0), we have that λ is an eigenvalue of AA*. The proof of the converse is left as an exercise.

Corollary 2.

Let A be an invertible matrix. Then ||A1||E=1/λ, where λ is the smallest eigenvalue of A* A.

Proof.

Recall that λ is an eigenvalue of an invertible matrix if and only if λ1 is an eigenvalue of its inverse.

Now let λ1λ2λn be the eigenvalues of A*A, which by the lemma are the eigenvalues of AA*. Then ||A1||E2 equals the largest eigenvalue of (A1)*A1=(AA*)1, which equals 1/λn.

For many applications, it is only the largest and smallest eigenvalues that are of interest. For example, in the case of vibration problems, the smallest eigenvalue represents the lowest frequency at which vibrations can occur.

We see the role of both of these eigenvalues in our study of conditioning.

Example 3

Let

A=(101110011).

Then

B=A*A=(211121112).

The eigenvalues of B are 3, 3, and 0. Therefore ||A||E=3. For any

x=(abc)0,

we may compute R(x) for the matrix B as

3R(x)=Bx, x||x||2=2(a2+b2+c2ab+ac+bc)a2+b2+c2.

Now that we know ||A||E exists for every square matrix A, we can make use of the inequality ||Ax||||A||E||x||, which holds for every x.

Assume in what follows that A is invertible, b0, and Ax=b. For a given δb, let δx be the vector that satisfies A(x+δx)=b+δb. Then A(δx)=δb, and so δx=A1(δb). Hence

||b||=||Ax||||A||E||x||and||δx||=||A1(δb)||||A1||E||δb||.

Thus

||δx||||x||||x||2||b||/||A||E||A1||E||δb||||A||E||b||=||A||E||A1||E(||δb||||b||).

Similarly (see Exercise 9),

1||A||E||A1||E(||δb||||b||)||δb||||x||.

The number ||A||E||A1||E is called the condition number of A and is denoted cond(A). We summarize these results in the following theorem.

Theorem 6.45.

For the system Ax=b, where A is invertible and b0, the following statements are true.

  1. (a) We have 1cond(A)||δb||||b||||δx||||x||cond(A)||δb||||b||.

  2. (b) If λ1 and λn are the largest and smallest eigenvalues, respectively, of A*A, then cond(A)=λ1/λn.

Proof.

Statement (a) follows from the previous inequalities, and (b) follows from Corollaries 1 and 2 to Theorem 6.44.

It should be noted that the definition of cond(A) depends on how the norm of A is defined. There are many reasonable ways of defining the norm of a matrix. In fact, the only property needed to establish Theorem 6.45(a) and the two displayed inequalities preceding it is that ||Ax||||A||E||x|| for all x.

It is clear from Theorem 6.45(a) that cond(A)1. It is left as an exercise to prove that cond(A)=1 if and only if A is a scalar multiple of a unitary or orthogonal matrix. Moreover, it can be shown with some work that equality can be obtained in (a) by an appropriate choice of b and δb.

We can see immediately from (a) that if cond(A) is close to 1, then a small relative error in b forces a small relative error in x. If cond(A) is large, however, then the relative error in x may be small even though the relative error in b is large, or the relative error in x may be large even though the relative error in b is small! In short, cond(A) merely indicates the potential for large relative errors.

We have so far considered only errors in the vector b. If there is an error δA in the coefficient matrix of the system Ax=b, the situation is more complicated. For example, A=δA may fail to be invertible. But under the appropriate assumptions, it can be shown that a bound for the relative error in x can be given in terms of cond(A). For example, Charles Cullen (Charles G. Cullen, An Introduction to Numerical Linear Algebra, PWS Publishing Co., Boston 1994, p. 60) shows that if A+δA is invertible, then

||Ax||||x+δx||cond(A)||δA||E||A||E.

It should be mentioned that, in practice, one never computes cond(A) from its definition, for it would be an unnecessary waste of time to compute A1 merely to determine its norm. In fact, if a computer is used to find A1, the computed inverse of A in all likelihood only approximates A1, and the error in the computed inverse is affected by the size of cond(A). So we are caught in a vicious circle! There are, however, some situations in which a usable approximation of cond(A) can be found. Thus, in most cases, the estimate of the relative error in x is based on an estimate of cond(A).

Exercises

  1. Label the following statements as true or false.

    1. (a) If Ax=b is well-conditioned, then cond(A) is small.

    2. (b) If cond(A) is large, then Ax=b is ill-conditioned.

    3. (c) If cond(A) is small, then Ax=b is well-conditioned.

    4. (d) The norm of A equals the Rayleigh quotient.

    5. (e) The norm of A always equals the largest eigenvalue of A.

  2. Compute the norms of the following matrices.

    1. (a) (4013)

    2. (b) (5333)

    3. (c) (123002310231)

  3. Prove that if B is symmetric, then ||B||E is the largest eigenvalue of B.

  4. Let A and A1 be as follows:

    A=(61317132938173850)andA1=(6414117175).

    The eigenvalues of A are approximately 84.74, 0.2007, and 0.0588.

    1. (a) Approximate ||A||E, ||A1||E, and cond(A). (Note Exercise 3.)

    2. (b) Suppose that we have vectors x and x˜ such that Ax=b and ||bAx˜||0.001. Use (a) to determine upper bounds for ||x˜A1b|| (the absolute error) and ||x˜A1b||/||A1b|| (the relative error).

  5. Suppose that x is the actual solution of Ax=b and that a computer arrives at an approximate solution x˜. If cond(A)=100, ||b||=1, and ||bAx˜||=0.1, obtain upper and lower bounds for ||xx˜||/||x||.

  6. Let

    B=(211121112).

    Compute

    R(123),||B||E,andcond(B).
  7. Let B be a symmetric matrix. Prove that minx0 R(x) equals the smallest eigenvalue of B.

  8. Prove that if λ is an eigenvalue of AA*, then λ is an eigenvalue of A*A. This completes the proof of the lemma to Corollary 2 to Theorem 6.44.

  9. Prove that if A is an invertible matrix and Ax=b, then

    1||A||E||A1||E(||δb||||b||)||δx||||x||.
  10. Prove the left inequality of (a) in Theorem 6.45.

  11. Prove that cond(A)=1 if and only if A is a scalar multiple of a unitary or orthogonal matrix.

    1. (a) Let A and B be square matrices that are unitarily equivalent. Prove that ||A||E=||B||E.

    2. (b) Let T be a linear operator on a finite-dimensional inner product space V. Define

      ||T||E=maxx0||T(x)||||x||.

      Prove that ||T||E=||[T]β||E, where β is any orthonormal basis for V.

    3. (c) Let V be an infinite-dimensional inner product space with an orthonormal basis {v1, v2, }. Let T be the linear operator on V such that T(vk)=kvk. Prove that ||T||E (defined in (b)) does not exist.

    Visit goo.gl/B8Uw33 for a solution.

The next exercise assumes the definitions of singular value and pseudoinverse and the results of Section 6.7.

  1. Let A be an n×n matrix of rank r with the nonzero singular values σ1σ2σr. Prove each of the following results.

    1. (a) ||A||E=σ1.

    2. (b) ||A||E=1σr.

    3. (c) If A is invertible (and hence r=n), then cond(A)=σ1σn.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.119.148