Chapter 13

Alternative System Solvers

We have encountered Gauss elimination methods for solving linear systems. These work well for moderately sized systems (up to a few thousand equations), in particular if they do not pose numerical problems. Ill-conditioned problems are more efficiently attacked using the Householder method below. Huge linear systems (up to a million equations) are more successfully solved with iterative methods. Often times, these linear systems are defined by a sparse matrix, one that has many zero entries, as is illustrated in Figure 13.1. All these alternative methods are the topics of this chapter.

Figure 13.1

Figure showing a sparse matrix: all nonzero entries are marked.

A sparse matrix: all nonzero entries are marked.

13.1 The Householder Method

Let’s revisit the problem of solving the linear system Au = b, where A is an n× n matrix. We may think of A as n column vectors, each with n elements,

[a1...an]u=b.[a1...an]u=b.

In Section 12.2 we examined the classical method of Gauss elimination, or the process of applying shears Gi to the vectors a1 ...an and b in order to convert A to upper triangular form, or

Gn1...G1Au=Gn1...G1b,Gn1...G1Au=Gn1...G1b,

and then we were able to solve for u with back substitution. Each Gi is a shear matrix, constructed to transform the ith column vector Gi-1 ...G1ai to a vector with zeroes below the diagonal element, ai,i.

As it turns out, Gauss elimination is not the most robust method for solving a linear system. A more numerically stable method may be found by replacing shears with reflections. This is the Householder method.

The Householder method applied to a linear system takes the same form as Gauss elimination. A series of reflections Hi is constructed and applied to the system,

Hn1...H1Au=Hn1...H1b,Hn1...H1Au=Hn1...H1b,

where each Hi transforms the column vector Hi−1 ... H1ai to a vector with zeroes below the diagonal element. Let’s examine how we construct a Householder transformation Hi.

A simple 2 × 2 matrix

[1210][1120]

will help to illustrate the construction of a Householder transformation. The column vectors of this matrix are illustrated in Sketch 13.1. The first transformation, H1A, reflects a1 onto the e1 axis to the vector a1=||a1||e1a1=||a1||e1, or

[11][20].[11][20].

Sketch 13.1

Sketch showing householder reflection of vector a1 to ||a1||e1.

Householder reflection of vector a1 to ||a1||e1.

We will reflect about the line L1 illustrated in Sketch 13.1, so we must construct a normal n1 to this line:

n1=a1||a1||e1||a1||a1||e1||,n1=a1||a1||e1||a1||a1||e1||,

which is simply the normalized difference between the original vector and the target vector (after the reflection). The implicit equation of the line L1 is

nT1x=0,nT1x=0,

and nT1a1nT1a1 is the distance of the point (o + a1) to L1. Therefore, the reflection constitutes moving twice the distance to the line in the direction of the normal, so

ˉa1=a1[2nT1a1]n1.a¯1=a1[2nT1a1]n1.

(Notice that 2nT1a12nT1a1 is a scalar.) To write this reflection in matrix form, we rearrange the terms,

a1=a12n1[nT1a1]=(I2n1nT1)a1a1==a12n1[nT1a1](I2n1nT1)a1

Notice that 2n1nT12n1nT1 is a dyadic matrix, as introduced in Section 11.5. We now have the matrix H1 defining one Householder transformation:

H1=I2n1nT1.H1=I2n1nT1.

This is precisely the reflection we constructed in (11.13)!

Example 13.1

The Householder matrix H1 for our 2 × 2 example is formed with

n1=[0.3820.923],n1=[0.3820.923],

then

H1=I2[0.1460.3530.3530.853]=[0.7070.7070.7070.707].H1=I2[0.1460.3530.3530.853]=[0.7070.7070.7070.707].

The transformed matrix is formed from the column vectors

H1a1=[20]andH1a2=[22].H1a1=[20]andH1a2=[22].

The 2 × 2 example serves only to illustrate the underlying geometry of a reflection matrix. The construction of a general Householder transformation Hi is a little more complicated. Suppose now that we have the following matrix

A=[a1,1a1,2a1,3a1,40a2,2a2,3a2,400a3,3a3,400a4,3a4,4].A=a1,1000a1,2a2,200a1,3a2,3a3,3a4,3a1,4a2,4a3,4a4,4.

We need to construct H3 to zero the element a4,3 while preserving A’s upper triangular nature. In other words, we want to preserve the elimination done by previous transformations, H1 and H2. To achieve this, construct

ˉa3=[00a3,3a4,3],a¯3=00a3,3a4,3,

and then construct H3 so that

H3ˉa3=γe3=[00γ0],H3a¯3=γe3=00γ0,

where γ=±||ˉa3||γ=±||a¯3||. The matrix H3 has been designed so that H3a3 will modify only elements a3,3 and a4,3 and the length of a3 will be preserved.

That is the idea, so let’s develop it for n × n matrices. Suppose we have

ˉai=[00ai,ian,i].a¯i=00ai,ian,i.

We want to construct the Householder matrix Hi for the following transformation

Hiˉai=γei=[0γ0],Hia¯i=γei=0γ0,

where γ=±||ˉai||γ=±||a¯i||. Just as we developed for the 2 × 2 example,

Hi=I2ninTi,(13.1)Hi=I2ninTi,(13.1)

but we make a slight modification of the normal,

ni=ˉaiγei||  ||,ni=a¯iγei||  ||,

If ˉaia¯i is nearly parallel to ei then numerical problems can creep into our construction due to loss of significant digits in subtraction of nearly equal numbers. Therefore, it is better to reflect onto the direction of the ei-axis that represents the largest reflection.

Let Ni=ninTiNi=ninTi and note that it is symmetric and idempotent. Properties of Hi include being:

  • symmetric: Hi=HTiHi=HTi, since Ni is symmetric;
  • involutary: HiHi = I and thus Hi = Hi−1,
  • unitary (orthogonal): HTiHi=IHTiHi=I, and thus Hiv has the same length as v.

Implementation of Householder transformations doesn’t involve explicit construction and multiplication by the Householder matrix in (13.1). A numerically and computationally more efficient algorithm is easy to achieve since we know quite a bit about how each Hi acts on the column vectors. So the following describes some of the variables in the algorithm that aid optimization.

Some optimization of the algorithm can be done by rewriting n; let

vi=ˉaiγei,where γ ={sign ai,i||ˉai||||ˉai||if ai,i0otherwise.vi=a¯iγei,where γ ={sign ai,i||a¯i||||a¯i||if ai,i0otherwise.

Then we have that

2nnT=vvT12vTv=vvTα2nnT=vvT12vTv=vvTα

thus

α=γ2ai,iγ.α=γ2ai,iγ.

When Hi is applied to column vector c,

Hic=(IvvTα)c=csv.Hic=(IvvTα)c=csv.

In the Householder algorithm that follows, as we work on the jth column vector, we call

ˆak=[aj,kan,k]aˆk=aj,kan,k

to indicate that only elements j, ..., n of the kth column vector ak (with kj) are involved in a calculation. Thus application of Hj results in changes in the sub-block of A with aj,j at the upper-left corner. Hence, the vector aj and Hjaj coincide in the first j − 1 components.

For a more detailed discussion, see [2] or [11].

The Householder Method

Algorithm:

  • Input:
    • An n × m matrix A, where nm and rank of A is m; n vector b, augmented to A as the (m + 1)st column.
  • Output:
    • Upper triangular matrix HA written over A;
    • Hb written over b in the augmented (m + 1)st column of A.
    • (H=Hn1...H1)(H=Hn1...H1)
  • If n = m then p = n − 1; else p = m (p is last column to transform)
  • For j=1,2,...,pj=1,2,...,p
    • a=ˆajˆaja=aˆjaˆj
    • γ = sign(aj,j)aγ = sign(aj,j)a
    • α = aaj,jγα = aaj,jγ
    • Temporarily set aj,j = aj,j − γ
    • For k = j+1,...,m+1k = j+1,...,m+1
      • s=1α(ˆajˆak)s=1α(aˆjaˆk)
      • ˆak=ˆaksˆajaˆk=aˆksaˆj
    • Set ˆaj=[γ  0  ...  0]Taˆj=[γ  0  ...  0]T

Example 13.2

Let’s apply the Householder algorithm to the linear system

[110110001]u=[101].110110001u=101.

For j = 1 in the algorithm, we calculate the following values: γ =2γ =2, α=2+2α=2+2, and then we temporarily set

ˆa1=[1+210].aˆ1=1+210.

For k = 2, s=2/(2+2)s=2/(2+2). This results in

ˆa2=[020].aˆ2=020.

For k = 3, s = 0 and ˆa3aˆ3 remains unchanged. For k = 4, s=2/2s=2/2 and then the right-hand side vector becomes

ˆa4=[2/22/20].aˆ4=2/22/20.

Now we set ˆa1aˆ1, and the reflection H1 results in the linear system

[200020001]u=[2/22/21].200020001u=2/22/21.

Although not explicitly computed,

n1=[1+210]||  ||n1=1+210||  ||

and the Householder matrix

H1=[2/22/202/22/20001].H1=2/22/202/22/20001.

Notice that a3 was not affected because it is in the plane about which we were reflecting, and this is a result of the involutary property of the Householder matrix. The length of each column vector was not changed as a result of the orthogonal property. Since the matrix is upper triangular, we may now use back substitution to find the solution vector

u=[1/21/21].u=1/21/21.

The Householder algorithm is the method of choice if one is dealing with ill-conditioned systems. An example: for some data sets, the least squares solution to an overdetermined linear system can produce unreliable results because of the nature of ATA. In Section 13.4, we will examine this point in more detail. The Householder algorithm above is easy to use for such overdetermined problems: The input matrix A is of dimension n × m where nm. The following example illustrates that the Householder method will result in the least squares solution to an overdetermined system.

Example 13.3

Let’s revisit the least squares line-fitting problem from Example 12.13. See that example for a problem description, and see Figure 12.4 for an illustration. The overdetermined linear system for this problem is

[01101201301401501601][ab]=[3025404030525].01020304050601111111[ab]=3025404030525.

After the first Householder reflection (j = 1), the linear system becomes

[95.392.2000.6600.3300.006800.3400.6801.01][ab]=[54.5116.1422.2813.455.4439.2928.15].95.390000002.200.660.330.00680.340.681.01[ab]=54.5116.1422.2813.455.4439.2928.15.

For the second Householder reflection (j = 2), the linear system becomes

[95.392.2001.470000000000][ab][54.5151.1011.9113.645.3617.913.81].95.390000002.201.4700000[ab]54.5151.1011.9113.645.3617.913.81.

We can now solve the system with back substitution, starting with the first nonzero row, and the solution is

[ab]=[0.2334.82].[ab]=[0.2334.82].

Excluding numerical round-off, this is the same solution found using the normal equations in Example 12.13.

13.2 Vector Norms

Finding the magnitude or length of a 3D vector is fundamental to many geometric operations. It is also fundamental for n-dimensional vectors, even though such vectors might no longer have geometric meaning. For example, in Section 13.6, we examine iterative methods for solving linear systems, and vector length is key for monitoring improvements in the solution.

The magnitude or length of an n–dimensional vector v is typically measured by

||v||2=v21+...+v2n.(13.2)||v||2=v21+...+v2n.(13.2)

The (nonnegative) scalar ||v||2 is also referred to as the Euclidean norm because in ℝ3 it is Euclidean length. Since this is the “usual” way to measure length, the subscript 2 often is omitted.

More generally, (13.2) is a vector norm. Other vector norms are conceivable, for instance the 1-norm

||v||1=|v1|+|v2|+...+|vn|(13.3)||v||1=|v1|+|v2|+...+|vn|(13.3)

or the ∞-norm

||v||=maxi|vi|(13.4)||v||=maxi|vi|(13.4)

We can gain an understanding of these norms by studying the familiar case n = 2. Recall that a unit vector is one whose norm equals one. For the 2-norm ||v||2, all unit vectors form a circle of radius one. With a bit of thinking, we can see that for the 1-norm all unit vectors form a diamond, while in the ∞-norm, all unit vectors form a square of edge length two. This geometric interpretation is shown in Figure 13.2.

Figure 13.2

Figure showing vectornorms: outline of the unit vectors for the 2-norm is a circle,∞-norm is a square, 1-norm is a diamond.

Vectornorms: outline of the unit vectors for the 2-norm is a circle,∞-norm is a square, 1-norm is a diamond.

All three of these norms may be viewed as members of a whole family of norms, referred to as p-norms. The p-norm of v is given by

||v||p=(vp1+vp2+...+vpn)1/p.||v||p=(vp1+vp2+...+vpn)1/p.

It is easy to see that the 1- and 2-norm fit this mold. That p → ∞ gives the ∞-norm requires a little more thought; give it a try!

Example 13.4

Let

v=[102].v=102.

Then

||v||1=3||v||2=52.24,||v||=2.||v||1=3||v||2=52.24,||v||=2.

This example and Figure 13.2 demonstrate that

||v||1||v||2||v||;(13.5)||v||1||v||2||v||;(13.5)

this is also true in n dimensions.

All vector norms share some basic properties:

||v||0,(13.6)||v||0,(13.6)

||v||=0 if and only if v=0,(13.7)||v||=0 if and only if v=0,(13.7)

||cv||=|c|||v| for c ,(13.8)||cv||=|c|||v| for c R,(13.8)

||v+w||||v||+||w||.(13.9)||v+w||||v||+||w||.(13.9)

The last of these is the familiar triangle inequality.

By this time, we have developed a familiarity with the 2-norm, and these properties are easily verified. Recall the steps for the triangle inequality were demonstrated in (2.24).

Let’s show that the vector norm properties hold for the ∞-norm. Examining (13.4), for each v in ℝn, by definition, maxi|vi| ≥ 0. If maxi |vi| = 0 then vi = 0 for each i = 1,..., n and v = 0. Conversely, if v = 0 then maxi|vi| = 0. Thus we have established that the first two properties hold. The third property is easily shown by the properties of the absolute value function:

||cv||=maxi|cvi|=|c|maxi|vi|=|c|||v||||cv||=maxi|cvi|=|c|maxi|vi|=|c|||v||

For the triangle inequality:

||v+w||=maxi|vi+wi|maxi{|vi|+|wi|}maxi|vi|+maxi|wi|=||v||+||w||.||v+w||=maxi|vi+wi|maxi{|vi|+|wi|}maxi|vi|+maxi|wi|=||v||+||w||.

From a 2D geometric perspective, consider the 1-norm: starting at the origin, move v1 units along the e1-axis and then move v2 units along the e2-axis. The sum of these travels determines v’s length. This is analogous to driving in a city with rectilinear streets, and thus this norm is also called the Manhattan or taxicab norm.

The ∞-norm is also called the max-norm. Our standard measure of length, the 2-norm, can take more computing power than we might be willing to spend on a proximity problem, where we want to determine whether two points are closer than a given tolerance. Problems of this sort are often repeated many times and against many points. Instead, the max norm might be more suitable from a computing point of view and sufficient given the relationship in (13.5). This method of using an inexpensive measure to exclude many possibilities is called trivial reject in some disciplines such as computer graphics.

13.3 Matrix Norms

A vector norm measures the magnitude of a vector; does it also make sense to talk about the magnitude of a matrix?

Instead of exploring the general case right away, we will first give some 2D insight. Let A be a 2 × 2 matrix. It maps the unit circle to an ellipse—the action ellipse. In Figure 13.3 we see points on the unit circle at the end of unit vectors vi and their images Avi forming this action ellipse. We can learn more about this ellipse by looking at ATA. It is symmetric and positive definite, and thus has real and positive eigenvalues, which are λ′1, and λ2, in decreasing value. We then define

σi=λi,(13.10)σi=λi,(13.10)

Figure 13.3

Figure showing action of a 2 × 2 matrix: points on the unit circle (black) are mapped to points on an ellipse (gray) This is the action ellipse. The vector is the semi-major axis of the ellipse, which has length σ1.

Action of a 2 × 2 matrix: points on the unit circle (black) are mapped to points on an ellipse (gray) This is the action ellipse. The vector is the semi-major axis of the ellipse, which has length σ1.

to be the singular values of A. (In Chapter 16, we will explore singular values in more detail.)

The singular value σ1 is the length of the action ellipse’s semi-major axis and σ2 is the length of the action ellipse’s semi-minor axis. If A is a symmetric positive definite matrix, then its singular values are equal to its eigenvalues.

How much does A distort the unit circle? This is measured by its 2-norm, ||A||2. If we find the largest ||Avi||2, then we have an indication of how much A distorts. Assuming that we have k unit vectors vi we can compute

||A||2maxi||Avi||2.(13.11)||A||2maxi||Avi||2.(13.11)

As we increase k, (13.11) gives a better and better approximation to ||A||2. We then have

||A||2=max||v||2=1||Av||2.(13.12)||A||2=max||v||2=1||Av||2.(13.12)

It should be clear by now that this development is not restricted to 2 × 2 matrices, but holds for n × n ones as well. From now on, we will discuss this general case, where the matrix 2-norm is more complicated to compute than its companion vector norm. It is given by

||A||2=σ1,||A||2=σ1,

where σ1 is A’s largest singular value.

Recall that the inverse matrix A−1 “undoes” the action of A. Let the singular values of A−1 be called ˆσiσˆi, then

ˆσ1=1σn,...,ˆσn=1σ1σˆ1=1σn,...,σˆn=1σ1

and

||A1||2=1σn.||A1||2=1σn.

Singular values are frequently used in the numerical analysis of matrix operations. If you do not have access to software for singular values, then (13.11) will give a decent approximation. Singular values are typically computed using a method called singular value decomposition, or SVD, and they are the focus of Chapter 16.

Analogous to vector norms, there are other matrix norms. We give two important examples:

||A||1:maximum absolute column sum,||A||:maximum absolute row sum.||A||1:||A||:maximum absolute column sum,maximum absolute row sum.

Because the notation for matrix and vector norms is identical, it is important to note what is enclosed in the norm symbols—a vector or a matrix.

Another norm that is sometimes used is the Frobenius norm, which is given by

||A||F=σ21+...+σ2n,(13.13)||A||F=σ21+...+σ2n,(13.13)

where the σi are A’s singular values.

The following is not too obvious. The Euclidean norm: add up the squares of all matrix elements, and take the square root,

||A||F=a21,1+a21,2+...a2n,n(13.14)||A||F=a21,1+a21,2+...a2n,n(13.14)

is identical to the Frobenius norm.

A matrix A maps all unit vectors to an ellipsoid whose semi-axis lengths are given by σ1,...,σn. Then the Frobenius norm gives the total distortion caused by A.

Example 13.5

Let’s examine matrix norms for

A=[123345567].A=135246357.

Its singular values are given by 10.5, 7.97, 0.334 resulting in

||A||2=max{10.5,7.97,0.334}=10.5||A||1=max{9,12,15}=15||A||=max{6,12,15}=18||A||F=12+32+...(7)2=10.52+7.972+0.334=13.2||A||2||A||1||A||||A||F=max{10.5,7.97,0.334}=10.5=max{9,12,15}=15=max{6,12,15}=18=12+32+...(7)2=10.52+7.972+0.334=13.2

From the examples above, we see that matrix norms are real-valued functions of the linear space defined over all n × n matrices.1 Matrix norms satisfy conditions very similar to the vector norm conditions (13.6), (13.7), (13.8) and (13.9):

||A||>0 for AZ,(13.15)||A||>0 for AZ,(13.15)

||A||=0 for A=Z(13.16)||A||=0 for A=Z(13.16)

||cA||=|c|||A| c,(13.17)||cA||=|c|||A| cR,(13.17)

||A+B||||A||+||B||,(13.18)||A+B||||A||+||B||,(13.18)

||AB||||A||||B||,(13.19)||AB||||A||||B||,(13.19)

Z being the zero matrix.

How to choose a matrix norm? Computational expense and properties of the norm are the deciders. For example, the Frobenius and 2-norms are invariant with respect to orthogonal transformations. Many numerical methods texts provide a wealth of information on matrix norms and their relationships.

13.4 The Condition Number

We would like to know how sensitive the solution to Au = b is to changes in A and b. For this problem, the matrix 2-norm helps us define the condition of the map.

In most of our figures describing linear maps, we have mapped a circle, formed by many unit vectors Vi, to an ellipse, formed by vectors Avi. This ellipse is evidently closely related to the geometry of the map, and indeed its semi-major and semi-minor axis lengths correspond to A’s singular values. For n = 2, the eigenvalues of ATA are λ′1 and λ2, in decreasing value. The singular values of A are defined in (13.10). If σ1 is very large and σ2 is very small, then the ellipse will be very elongated, as illustrated by the example in Figure 13.4. The ratio

k(A)=||A||2||A1||2=σ1/σ2k(A)=||A||2||A1||2=σ1/σ2

Figure 13.4

Figure showing condition number: The action of A with κ(A) = 30.

Condition number: The action of A with κ(A) = 30.

is called the condition number of A. In Figure 13.4, we picked the (symmetric, positive definite) matrix

A=[1.5000.05],A=[1.5000.05],

which has condition number K(A) = 1.5/0.05 = 30.

Since ATA is symmetric and positive definite, κ(A) ≥ 1. If a matrix has a condition number close to one, it is called well-conditioned. If the condition number is 1 (such as for the identity matrix), then no distortion happens. The larger the κ(A), the more A distorts, and if κ(A) is “large,” the matrix is called ill-conditioned.

Example 13.6

Let

A=[cosαsinαsinαcosα],A=[cosαsinαsinαcosα],

meaning that A is an α degree rotation. Clearly, ATA = I, where I is the identity matrix. Thus, σ1 = σ2 = 1. Hence the condition number of a rotation matrix is 1. Since a rotation does not distort, this is quite intuitive.

Example 13.7

Now let

A=[100000.01],A=[100000.01],

a matrix that scales by 100 in the e1-direction and by 0.01 in the e2- direction. This matrix is severely distorting! We quickly find σ1 = 100 and σ2 = 0.01 and thus the condition number is 100/0.01 = 10,000. The fact that A distorts is clearly reflected in the magnitude of its condition number.

Back to the initial problem: solving Au = b. We will always have round-off error to deal with, but we want to avoid creating a poorly designed linear system, which would mean a matrix A with a “large” condition number. The definition of large is subjective and problem- specific. A guideline: κ(A) ≈ 10k can result in a loss of k digits of accuracy. If the condition number of a matrix is large, then irrespective of round-off, the solution cannot be depended upon. Practically speaking, a large condition number means that the solution to the linear system is numerically very sensitive to small changes in A or b. Alternatively, we can say that we can confidently calculate the inverse of a well-conditioned matrix.

The condition number is a better measure of singularity than the determinant because it is a scale and size n invariant measure. If s is a scalar, then κ(SA) = κ(A).

Example 13.8

Let n = 100. Form the identity matrix I and J = 0.1I.

detI=1,k(I)=1,detJ=10100,k(J)=1.detI=1,k(I)=1,detJ=10100,k(J)=1.

The determinant of J is small, indicating that there is a problem with this matrix. However, in solving a linear system with such a coefficient matrix, the scale of J poses no problem.

In Section 12.7, we examined overdetermined linear systems Au = b in m equations and n unknowns, where m > n. (Thus A is an m × n matrix.) Our approach to this problem depended on the approximation method of least squares, and we solved the system AT Au = ATb. The condition number for this new matrix, κ(ATA) = κ(A)2, so if A has a high condition number to start with, this process will create an ill-posed problem. In this situation, a method such as the Householder method (see Section 13.1), is preferred. We will revisit this topic in more detail in Section 16.6.

An advanced linear algebra or numerical analysis text will provide an in-depth study of the condition number and error analysis.

13.5 Vector Sequences

You are familiar with sequences of real numbers such as

1,12,14,18,...1,12,14,18,...

or

1,2,4,8,...1,2,4,8,...

The first of these has the limit 0, whereas the second one does not have a limit. One way of saying a sequence of real numbers ai has a limit a is that beyond some index i, all ai differ from the limit a by an arbitrarily small amount ε.

Vector sequences in ℝn,

v(0),v(1),v(2),...,v(0),v(1),v(2),...,

are not all that different. A vector sequence has a limit if each component has a limit.

Example 13.9

Let a vector sequence be given by

v(i)=[1/i1/i21/i3].v(i)=1/i1/i21/i3.

This sequence has the limit

v=[000].v=000.

Now take the sequence

v(i)=[i1/i21/i3].v(i)=i1/i21/i3.

It does not have a limit: even though the last two components each have a limit, the first component diverges.

We say that a vector sequence converges to v with respect to a norm if for any tolerance ε > 0, there exists an integer m such that

||v(i)v||<εfor alli>m.(13.20)||v(i)v||<εfor alli>m.(13.20)

In other words, from some index i on, the distance of any v(i) from v is smaller than an arbitrarily small amount ε. See Figure 13.5 for an example. If the sequence converges with respect to one norm, it will converge with respect to all norms. Our focus will be on the (usual) Euclidean or 2-norm, and the subscript will be omitted.

Figure 13.5

Figure showing vector sequences: a sequence that converges.

Vector sequences: a sequence that converges.

Vector sequences are key to iterative methods, such as the two methods for solving Au = b in Section 13.6 and the power method for finding the dominant eigenvalue in Section 15.2. In practical applications, the limit vector v is not known. For some special problems, we can say whether a limit exists; however, we will not know it a priori. So we will modify our theoretical convergence measure (13.20) to examine the distance between iterations. This can take on different forms depending on the problem at hand. In general, it will lead to testing the condition

||v(i)v(i+1)||<ε,||v(i)v(i+1)||<ε,

which measures the change from one iteration to the next. In the case of an iterative solution to a linear system, u(i), we will test for the condition that ||Au(i)b|| < ε, which indicates that this iteration has provided an acceptable solution.

13.6 Iterative System Solvers: Gauss-Jacobi and Gauss-Seidel

In applications such as finite element methods (FEM) in the context of the solution of fluid flow problems, scientists are faced with linear systems with many thousands of equations. Gauss elimination would work, but would be far too slow. Typically, huge linear systems have one advantage: the coefficient matrix is sparse meaning it has only very few (such as ten) nonzero entries per row. Thus, a 100,000 × 100,000 system would only have 1,000,000 nonzero entries, compared to 10,000,000,000 matrix elements! In these cases, one does not store the whole matrix, but only its nonzero entries, together with their i,j location. An example of a sparse matrix is shown in Figure 13.1. The solution to such systems is typically obtained by iterative methods, which we will discuss next.

Example 13.10

Let a system2 be given by

[410251124] [u1u2u3]=[103].421152014 u1u2u3=103.

An iterative method starts from a guess for the solution and then refines it until it is the solution. Let’s take

u(1)=[u(1)1u(1)2u(1)3]=[111]u(1)=u(1)1u(1)2u(1)3=111

for our first guess, and note that it clearly is not the solution to our system: Au(1)b.

A better guess ought to be obtained by using the current guess and solving the first equation for a new u(2)1u(2)1, the second for a new u(2)2u(2)2, and so on. This gives us

4u(2)1+1=1,2+5u(2)2+1=0,1+2+4u(2)3=3,4u(2)1+1=1,2+5u(2)2+1=0,1+2+4u(2)3=3,

and thus

u(2)=[00.60.5].u(2)=00.60.5.

The next iteration becomes

4u(3)10.6=1,5u(3)2+0.5=0,1.2+4u(3)3=3,4u(3)10.6=1,5u(3)2+0.5=0,1.2+4u(3)3=3,

and thus

u(3)=[0.40.11.05].u(3)=0.40.11.05.

After a few more iterations, we will be close enough to the true solution

u=[0.3330.3331.0].u=0.3330.3331.0.

Try one more iteration for yourself.

This iterative method is known as Gauss-Jacobi iteration. Let us now formulate this process for the general case. We are given a linear system with n equations and n unknowns ui, written in matrix form as

Au=b.Au=b.

Let us also assume that we have an initial guess u(1) for the solution vector u.

We now define two matrices D and R as follows: D is the diagonal matrix whose diagonal elements are those of A, and R is the matrix obtained from A by setting all its diagonal elements to zero. Clearly then

A=D+RA=D+R

and our linear system becomes

Du+Ru=bDu+Ru=b

or

u=D1[bRu].u=D1[bRu].

In the spirit of our previous development, we now write this as

u(k+1)=D1[bRu(k)],u(k+1)=D1[bRu(k)],

meaning that we attempt to compute a new estimate u(k+1) from an existing one u(k). Note that D must not contain zeroes on the diagonal; this can be achieved by row or column interchanges.

Example 13.11

With this new framework, let us reconsider our last example. We have

A=[410251124],R=[010201120],D1=[0.250000.20000.25].A=421152014,R=021102010,D1=0.250000.20000.25.

Then

 u(2)=[0.250000.20000.25]([103][010201120] [111])=[00.60.5] u(2)=0.250000.20000.25103021102010 111=00.60.5

Will the Gauss-Jacobi method succeed, i.e., will the sequence of vectors u(k) converge? The answer is: sometimes yes, and sometimes no. It will always succeed if A is diagonally dominant, which means that if, for every row, the absolute value of its diagonal element is larger than the sum of the absolute values of its remaining elements. In this case, it will succeed no matter what our initial guess u(1) was. Strictly diagonally dominant matrices are nonsingular. Many practical problems, for example, finite element ones, result in diagonally dominant systems.

In a practical setting, how do we determine if convergence is taking place? Ideally, we would like u(k) = u, the true solution, after a number of iterations. Equality will most likely not happen, but the length of the residual vector

||Au(k)b||||Au(k)b||

should become small (i.e., less than some preset tolerance). Thus, we check the length of the residual vector after each iteration, and stop once it is smaller than our tolerance.

A modification of the Gauss-Jacobi method is known as Gauss- Seidel iteration. When we compute u(k+1) in the Gauss-Jacobi method, we can observe the following: the second element, u(k+1)2u(k+1)2, is computed using u(k)1u(k)1, u(k)3u(k)3, ... u(k)nu(k)n. We had just computed u(k+1)1u(k+1)1. It stands to reason that using it instead of u(k)1u(k)1 would be advantageous. This idea gives rise to the Gauss-Seidel method: as soon as a new element u(k+1)iu(k+1)i is computed, the estimate vector u(k+1) is updated.

In summary, Gauss-Jacobi updates the new estimate vector once all of its elements are computed, Gauss-Seidel updates as soon as a new element is computed. Typically, Gauss-Seidel iteration converges faster than Gauss-Jacobi iteration.

image

  • reflection matrix
  • Householder method
  • overdetermined system
  • symmetric matrix
  • involutary matrix
  • orthogonal matrix
  • unitary matrix
  • vector norm
  • vector norm properties
  • Euclidean norm
  • L2 norm
  • ∞ norm
  • max norm
  • Manhattan norm
  • matrix norm
  • matrix norm properties
  • Frobenius norm
  • action ellipse axes
  • singular values
  • condition number
  • well-conditioned matrix
  • ill-conditioned matrix
  • vector sequence
  • convergence
  • iterative method
  • sparse matrix
  • Gauss-Jacobi method
  • Gauss-Seidel method
  • residual vector

13.7 Exercises

  1. Use the Householder method to solve the following linear system

    [11.11.110.90.900.10.2]u=[110.3].1101.10.90.11.10.90.2u=110.3.

    Notice that the columns are almost linearly dependent.

  2. Show that the Householder matrix is involuntary.
  3. What is the Euclidean norm of

    v=[1111].v=1111.

  4. Examining the 1- and 2-norms defined in Section 13.2, how would you define a 3-norm?
  5. Show that the ||v||1 satisfies the properties (13.6), (13.7), (13.8) and (13.9) of a vector norm.
  6. Define a new vector norm to be max{2|v1|, |v2|, ..., |vn|}. Show that this is indeed a vector norm. For vectors in ℝ2, sketch the outline of all unit vectors with respect to this norm.
  7. Let

    A=[101012001].A=100010121.

    What is A’s 2-norm?

  8. Let four unit vectors be given by

    v1=[10],v2=[01],v3=[10],v4=[01].v1=[10],v2=[01],v3=[10],v4=[01].

    Using the matrix

    A=[1101],A=[1011],

    compute the four image vectors Avi. Use these to find an approximation to ∥A2.

  9. Is the determinant of a matrix a norm?
  10. Why is 1 the smallest possible condition number for a nonsingular matrix?
  11. What is the condition number of the matrix

    A=[0.70.311]A=[0.710.31]

    that generated Figure 7.11?

  12. What can you say about the condition number of a rotation matrix?
  13. What is the condition number of the matrix

    [1000]?[1000]?

    What type of matrix is it, and is it invertible?

  14. Is the condition number of a matrix a norm?
  15. Define a vector sequence by

    ui=[001/i0101/i00] [111];i=1,2,....ui=001/i0101/i00 111;i=1,2,....

    Does it have a limit? If so, what is it?

  16. Let a vector sequence be given by

    u(1)=[11]andu(i+1)=[1/21/403/4]u(i).u(1)=[11]andu(i+1)=[1/201/43/4]u(i).

    Will this sequence converge? If so, to what vector?

  17. Let a linear system be given by

    A=[401282102]andb=[220].A=421080122andb=220.

    Carry out three iterations of the Gauss-Jacobi iteration starting with an initial guess

    u(1)=[000].u(1)=000.

  18. Let a linear system be given by

    A=[401282102]andb=[020].A=421080122andb=020.

    Carry out three iterations of the Gauss-Jacobi iteration starting with an initial guess

    u(1)=[000].u(1)=000.

  19. Carry out three Gauss-Jacobi iterations for the linear system

    A=[4101141001411014]andb=[0110],A=4101141001411014andb=0110,

    starting with the initial guess

    u(1)=[1111].u(1)=1111.

  20. Carry out three iterations of Gauss-Seidel for Example 13.10. Which method, Gauss-Jacobi or Gauss-Seidel, is converging to the solution faster? Why?

1More on this type of linear space in Chapter 14.

2This example was taken from Johnson and Riess [11].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.54.7