Chapter 12

Gauss for Linear Systems

In Chapter 5, we studied linear systems of two equations in two unknowns. A whole chapter for such a humble task seems like a bit of overkill—its main purpose was really to lay the groundwork for this chapter.

Linear systems arise in virtually every area of science and engineering—some are as big as 1,000,000 equations in as many unknowns. Such huge systems require more sophisticated treatment than the methods introduced here. They will allow you to solve systems with several thousand equations without a problem.

Figure 12.1 illustrates the use of linear systems in the field of data smoothing. The left triangulation looks somewhat “rough”; after setting up an appropriate linear system, we compute the “smoother” triangulation on the right, in which the triangles are closer to being equilateral.

Figure 12.1

Figure showing linear systems: the triangulation on the right was obtained from the left one by solving a linear system.

Linear systems: the triangulation on the right was obtained from the left one by solving a linear system.

This chapter explains the basic ideas underlying linear systems. Readers eager for hands-on experience should get access to software such as Mathematica or MATLAB. Readers with advanced programming knowledge can download linear system solvers from the web. The most prominent collection of routines is LAPACK.

12.1 The Problem

A linear system is a set of equations like this:

3u12u210u3+u4=0u1u3=4u1+u22u3+3u4=1u2+2u4=4.

The unknowns are the numbers u1, u2, u3, u4. There are as many equations as there are unknowns, four in this example. We rewrite this 4 × 4 linear system in matrix form:

[32101101011230102] [u1u2u3u4]=[0414].

A general n × n linear system looks like this:

a1,1u1+a1,2u2+...+a1,nun=b1a2,1u1+a2,2u2+...+a2,nun=b2an,1u1+an,2u2+...+an,nun=bn.

In matrix form, it becomes

[a1,1a1,2...a1,na2,1a2,2...a2,nan,1an,2...an,n] [u1u2un]=[b1b2bn],(12.1)

[a1a2...an]u=b,

or even shorter

Au=b.

The coefficient matrix A has n rows and n columns. For example, the first row is

a1,1,a1,2,...,a1,n,

and the second column is

a1,2a2,2an,2.

Equation (12.1) is a compact way of writing n equations for the n unknowns u1, ..., un. In the 2 × 2 case, such systems had nice geometric interpretations; in the general case, that interpretation needs n-dimensional linear spaces, and is not very intuitive. Still, the methods that we developed for the 2 × 2 case can be gainfully employed here!

Some underlying principles with a geometric interpretation are best explained for the example n = 3. We are given a vector b and we try to write it as a linear combination of vectors a1, a2, a3,

[a1a2a3]u=b.

If the ai are truly 3D, i.e., if they form a tetrahedron, then a unique solution may be found (see Sketch 12.1). But if the three ai all lie in a plane (i.e., if the volume formed by them is zero), then you cannot write b as a linear combination of them, unless it is itself in that 2D plane. In this case, you cannot expect uniqueness for your answer. Sketch 12.2 covers these cases. In general, a linear system is uniquely solvable if the ai have a nonzero n-dimensional volume. If they do not, they span a k-dimensional subspace (with k < n)—nonunique solutions exist only if b is itself in that subspace. A linear system is called consistent if at least one solution exists.

Sketch 12.1

Sketch showing a solvable 3 × 3 system.

A solvable 3 × 3 system.

Sketch 12.2

Sketch showing top: no solution, bottom: nonunique solution.

Top: no solution, bottom: nonunique solution.

For 2D and 3D, we encountered many problems that lent themselves to constructing the linear system in terms of a linear combination of column vectors, the ai. However, in Section 5.11 we looked at how a linear system can be interpreted as a problem using equations built row by row. In n-dimensions, this commonly occurs. An example follows.

Example 12.1

Suppose that at five time instances, say ti = 0, 0.25, 0.5, 0.75, 1 seconds, we have associated observation data, p(ti) = 0, 1, 0.5, 0.5, 0. We would like to fit a polynomial to these data so we can estimate values in between the observations. This is called polynomial interpolation. Five points require a degree four polynomial,

p(t)=c0+c1t+c2t2+c3t3+c4t4,

which has five coefficients ci. Immediately we see that we can write down five equations,

p(ti)=c0+c1ti+c2ti2+c3ti3+c4ti4,i=0,...,4,

or in matrix form,

[1t0t02t03t041t1t12t13t141t4t42t43t44] [c0c1c4]=[p(t0)p(t1)p(t4)]

Figure 12.2 illustrates the result, p(t) = 12.667t − 50t2 + 69.33t3 − 32t4, for the given data.

Figure 12.2

Figure showing polynomial interpolation: a degree four polynomial fit to five data points.

Polynomial interpolation: a degree four polynomial fit to five data points.

12.2 The Solution via Gauss Elimination

The key to success in the 2 × 2 case was the application of a shear (forward elimination) so that the matrix A was transformed to upper triangular, meaning all entries below the diagonal are zero. Then it was possible to apply back substitution to solve for the unknowns. A shear was constructed to map the first column vector of the matrix onto the e1-axis. Revisiting an example from Chapter 5, we set

[2416] [u1u2]=[44].(12.2)

The shear used was

S1=[101/21],

which when applied to the system as

S1Au=S1b

produced the system

[2404] [u1u2]=[42].

Algebraically, what this shear did was to change the rows of the system in the following manner:

row1row1androw2row212row1.

Each of these constitutes an elementary row operation. Back substitution came next, with

u2=14×2=12

and then

u1=12(44u2)=1.

The divisions in the back substitution equations are actually scalings, thus they could be rewritten in terms of a scale matrix:

S2=[1/2001/4],

and then the system would be transformed via

S2S1Au=S2S1b.

The corresponding upper triangular matrix and system is

[1201] [u1u2]=[21/2].

Check for yourself that we get the same result.

Thus, we see the geometric steps for solving a linear system have methodical algebraic interpretations. We will be following this algebraic approach for the rest of the chapter. For general linear systems, the matrices, such as S1 and S2 above, are not actually constructed due to speed and storage expense. Notice that the shear to zero one element in the matrix, changed the elements in only one row, thus it is unnecessary to manipulate the other rows. This is an important observation for large systems.

In the general case, just as in the 2 × 2 case, pivoting will be used. Recall for the 2 × 2 case this meant that the equations were reordered such that the (pivot) matrix element a1,1 is the largest one in the first column. A row exchange can be represented in terms of a permutation matrix. Suppose the 2 × 2 system in (12.2) was instead,

[1624] [u1u2]=[44],

requiring pivoting as the first step. The permutation matrix that will exchange the two rows is

P1=[0110],

which is the identity matrix with the rows (columns) exchanged. After this row exchange, the steps for solving P1Au = P1b are the same as for the system in (12.2): S2S1P1Au = S2S1P1b.

Example 12.2

Let’s step through the necessary row exchanges and shears for a 3 × 3 linear system. The goal is to get it in upper triangular form so we may use back substitution to solve for the unknowns. The system is

[220402424] [u1u2u3]=[420].

The matrix element a1, 1 is not the largest in the first column, so we choose the 4 in the second row to be the pivot element and we reorder the rows:

[402220424] [u1u2u3]=[240].

The permutation matrix that achieves this row exchange is

P1=[010100001].

(The subscript 1 indicates that this matrix is designed to achieve the appropriate exchange for the first column.)

To zero entries in the first column apply:

row2row212row1row3row3row1,

and the system becomes

[402021022] [u1u2u3]=[252].

The shear matrix that achieves this is

G1=[1001/210101].

Now the first column consists of only zeroes except for a1,1, meaning that it is lined up with the e1-axis.

Now work on the second column vector. First, check if pivoting is necessary; this means checking that a2,2 is the largest in absolute value of all values in the second column that are below the diagonal. No pivoting is necessary. (We could say that the permutation matrix P2 = I.) To zero the last element in this vector apply

row3row3+row2,

which produces

[402021001] [u1u2u3]=[257].

The shear matrix that achieves this is

G2=[100010011].

By chance, the second column is aligned with e2 because a1,2 = 0. If this extra zero had not appeared, then the last operation would have mapped this 3D vector into the [e1, e2]-plane.

Now that we have the matrix in upper triangular form, we are ready for back substitution:

u3=11(7)u2=12(5u3)u1=14(2+2u3).

This implicitly incorporates a scaling matrix. We obtain the solution

[u1u2u3]=[467].

It is usually a good idea to insert the solution into the original equations:

[220402424] [467]=[420].

It works!

The example above illustrates each of the elementary row operations that take place during Gauss elimination:

  • Pivoting results in the exchange of two rows.
  • Shears result in adding a multiple of one row to another.
  • Scaling results in multiplying a row by a scalar.

Gauss elimination for solving a linear system consists of two basic steps: forward elimination (pivoting and shears) and then back substitution (scaling). Here is the algorithm for solving a general n × n system of linear equations.

Gauss Elimination with Pivoting

Given: An n × n coefficient matrix A and a n × 1 right-hand side b describing a linear system

Au=b,

which is short for the more detailed (12.1).

Find: The unknowns u1,...,un.

Algorithm:

  • Initialize the n × n matrix G = I.
  • For j=1,...,n1 (j counts columns)
    • Pivoting step:
    • Find the element with the largest absolute value in column j from aj,j to an,j; this is element ar,j.
      • If r > j, exchange equations r and j.
    • If aj,j = 0, the system is not solvable.
    • Forward elimination step for column j:
    • For i = j + 1,...,n (elements below diagonal of column j)
      • Construct the multiplier gi,j = ai,j/aj,j
      • ai,j = 0
      • For k = j + 1, ..., n (each element in row i after column j)
        • ai,k = ai,kgi,jaj,k
      • bi = bigi,jbj
  • At this point, all elements below the diagonal have been set to zero. The matrix is now in upper triangular form.
    • Back substitution:
    • un=bn/an,n
    • For j=n-1,...,1
      • uj=1aj,j[bjaj,j+1uj+1...aj,nun].

In a programming environment, it can be convenient to form an augmented matrix, which is the matrix A augmented with the vector b. Here is the idea for a 3 × 3 linear system:

[a1,1a1,2a1,3b1a2,1a2,2a2,3b2a3,1a3,2a3,3b3].

Then the k steps would run to n + 1, and there would be no need for the extra line for the bi element.

As demonstrated in Example 12.2, the operations in the elimination step above may be written in matrix form. If A is the current matrix, then at step j, we first check if a row exchange is necessary. This may be achieved with a permutation matrix, Pj. If no pivoting is necessary, Pj = I. To produce zeroes under aj,j the matrix product Gj A is formed, where

Gj=[111gj+1,j1gn,j1].(12.3)

The elements −gi,j of Gj are the multipliers. The matrix Gj is called a Gauss matrix. All entries except for the diagonal and the entries −gi,j below the diagonal of the jth column are zero. We could store all Pj and Gj in one matrix

G=Gn1Pn1...G2P2G1P1.(12.4)

If no pivoting is required, then it is possible to store the gi,j in the zero elements of A rather than explicitly setting the ai,j element equal to zero. Regardless, it is more efficient with regard to speed and storage not to multiply A and b by the permutation and Gauss matrices because this would result in many unnecessary calculations.

To summarize, Gauss elimination with pivoting transforms the linear system Au = b into the system

GAu=Gb,

which has the same solution as the original system. The matrix GA is upper triangular, and it is referred to as U. The diagonal elements of U are the pivots.

Example 12.3

We look at another example, taken from [2]. Let the system be given by

[220112211] [u1u2u3]=[697].

We start the algorithm with j = 1, and observe that no element in column 1 exceeds a1,1 in absolute value, so no pivoting is necessary at this step, thus P1 = I. Proceed with the elimination step for row 2 by constructing the multiplier

g2,1=a2,1/a1,1=1/2.

Change row 2 as follows:

row2row212row1.

Remember, this includes changing the element b2. Similarly for row 3,

g3,1=a3,1/a1,1=2/2=1

then

row3row3row1.

Step j = 1 is complete and the linear system is now

[220002011] [u1u2u3]=[661].

The Gauss matrix for j = 1,

G1=[1001/210101].

Next is column 2, so j = 2. Observe that a2,2 = 0, whereas a3,2 = −1. We exchange equations 2 and 3 and the system becomes

[220011002] [u1u2u3]=[616].(12.5)

The permutation matrix for this exchange is

P2=[100001010].

If blindly following the algorithm above, we would proceed with the elimination for row 3 by forming the multiplier

g3,2=a3,2a2,2=01=0.

Then operate on the third row

row3row30×row2,

which doesn’t change the row at all. (So we will record G2 = I.) Due to numerical instabilities, g3,2 might not be exactly zero. Without putting a special check for a zero multiplier, this unnecessary work takes place. Tolerances are very important here.

Apply back substitution by first solving for the last unknown:

u3=3.

Start the back substitution loop with j = 2:

u2=11[1u3]=2,

and finally

u1=12[62u2]=1.

It’s a good idea to check the solution:

[220112211] [123]=[697].

The final matrix G = G2P2G1P1 is

G=[1001011/210].

Check that GAu = Gb results in the linear system in (12.5).

Just before back substitution, we could scale to achieve ones along the diagonal of the matrix. Let’s do precisely that to the linear system in Example 12.3. Multiply both sides of (12.5) by

[1/200010001/2].

This transforms the linear system to

[110011001] [u1u2u3]=[313].

This upper triangular matrix with rank = n is said to be in row echelon form. If the matrix is rank deficient, rank < n, then the rows with all zeroes should be the last rows. Some definitions of row echelon do not require ones along the diagonal, as we have here; it is more efficient to do the scaling as part of back substitution.

Gauss elimination requires O(n3) operations.1 Thus this algorithm is suitable for a system with thousands of equations, but not for a system with millions of equations. When the system is very large, often times many of the matrix elements are zero—a sparse linear system. Iterative methods, which are introduced in Section 13.6, are a better approach in this case.

12.3 Homogeneous Linear Systems

Let’s revisit the topic of Section 5.8, homogeneous linear systems, which take the form

Au=0.

The trivial solution is always an option, but of little interest. How do we use Gauss elimination to find a nontrivial solution if it exists? Once we have one nontrivial solution, all multiples cu are solutions as well. The answer: slightly modify the back substitution step. An example will make this clear.

Example 12.4

Start with the homogeneous system

[123123123]u=[000].

The matrix clearly has rank one. First perform forward elimination, arriving at

[123000000]u=[000].

For each zero row of the transformed system, set the corresponding ui, the free variables, to one: u3 = 1 and u2 = 1. Back substituting these into the first equation gives u1 = −5 for the pivot variable. Thus a final solution is

u=[511],

and all vectors cu are solutions as well.

Since the 3 × 3 matrix is rank one, it has a two dimensional null space. The number of free variables is equal to the dimension of the null space. We can systematically construct two vectors u1, u2 that span the null space by setting one of the free variables to one and the other to zero, resulting in

u1=[301]andu2=[210].

All linear combinations of elements of the null space are also in the null space, for example, u = 1u1 + 1u2.

Column pivoting might be required to ready the matrix for back substitution.

Example 12.5

The homogeneous system

[063002000]u=[000].

comes from an eigenvector problem similar to those in Section 7.3. (More on eigenvectors in higher dimensions in Chapter 15.)

The linear system in its existing form leads us to 0u3 = 0 and 2u3 = 0. To remedy this, we proceed with column exchanges:

[630020000] [u2u3u1]=[000],

where column 1 was exchanged with column 2 and then column 2 was exchanged with column 3. Each exchange requires that the associated unknowns are exchanged as well. Set the free variable: u1 = 1, then back substitution results in u3 = 0 and u2 = 0. All vectors

c=[100]

satisfy the original homogeneous system.

12.4 Inverse Matrices

The inverse of a square matrix A is the matrix that “undoes” A’s action, i.e., the combined action of A and A−1 the identity:

AA1=I.(12.6)

We introduced the essentials of inverse matrices in Section 5.9 and reviewed properties of the inverse in Section 9.11. In this section, we introduce the inverse for n × n matrices and discuss inverses in the context of solving linear systems, Gauss elimination, and LU decomposition (covered in more detail in Section 12.5).

Example 12.6

The following scheme shows a matrix A multiplied by its inverse A−1. The matrix A is on the left, A−1 is on top, and the result of the multiplication, the identity, is on the lower right:

How do we find the inverse of a matrix? In much the same way as we did in the 2 × 2 case in Section 5.9, we write

A[a1¯...an¯]=[e1...en].(12.7)

Here, the matrices are n × n, and the vectors ai¯ as well as the ei are vectors with n components. The vector ei has all zero entries except for its ith component; it equals 1.

We may now interpret (12.7) as n linear systems:

Aa1¯=e1,...,Aan¯=en.(12.8)

In Example 5.8, we applied shears and a scaling to transform A into the identity matrix, and at the same time, the right-hand side (e1 and e2) was transformed into A−1. Now, with Gauss elimination as per Section 12.2, we apply forward elimination to A and to each of the ei. Then with back substitution, we solve for each of the ai¯ that form A−1. However, as we will learn in Section 12.5, a more economical solution is found with LU decomposition. This method is tailored to solving multiple systems of equations that share the same matrix A, but have different right-hand sides.

Inverse matrices are primarily a theoretical concept. They suggest to solve a linear system Av = b by computing A−1 and then to set v = A−1b. Don’t do that! It is a very expensive way to solve a linear system; simple Gauss elimination or LU decomposition is much cheaper. (Explicitly forming the inverse requires forward elimination, n back substitution algorithms, and then a matrix-vector multiplication. On the other hand, Gauss elimination requires forward elimination and just one back substitution algorithm.)

The inverse of a matrix A exists only if the matrix is square (n × n) and the action of A does not reduce dimensionality, as in a projection. This means that all columns of A must be linearly independent. There is a simple way to see if a matrix A is invertible; just perform Gauss elimination for the first of the linear systems in (12.8). If you are able to transform A to upper triangular with all nonzero diagonal elements, then A is invertible. Otherwise, it is said to be singular. The term “nonzero” is to be taken with a grain of salt: real numbers are (almost) never zero, and tolerances must be employed.

Again we encounter the concept of matrix rank. An invertible matrix is said to have rank n or full rank. If a matrix reduces dimensionality by k, then it has rank nk. The n × n identity matrix has rank n; the zero matrix has rank 0. An example of a matrix that does not have full rank is a projection. Review the structure of a 2D orthogonal projection in Section 4.8 and a 3D projection in Section 9.7 to confirm this statement about the rank.

Example 12.7

We apply forward elimination to three 4 × 4 matrices to achieve row echelon form:

M1=[1330033100000000],M2=[1330033100100000],M3=[1330033100100002],

M1 has rank 2, M2 has rank 3, and M3 has rank 4 or full rank.

Example 12.8

Let us compute the inverse of the n × n matrix Gj as defined in (12.3):

Gj1=[111gj+1,j1gn,j1].

That’s simple! To make some geometric sense of this, you should realize that Gj is a shear, and so is Gj1, and it “undoes” Gj.

Here is another interesting property of the inverse of a matrix. Suppose k ≠ 0 and kA is an invertible matrix, then

(kA)1=1kA1.

And yet another: If two matrices, A and B, are invertible, then the product AB is invertible, too.

12.5 LU Decomposition

Gauss elimination has two major parts: transforming the system to upper triangular form with forward elimination and back substitution. The creation of the upper triangular matrix may be written in terms of matrix multiplications using Gauss matrices Gj. For now, assume that no pivoting is necessary. If we denote the final upper triangular matrix by U, then we have

Gn1...G1A=U.(12.9)

It follows that

A=G11...Gn11U.

The neat thing about the product G11...Gn11 is that it is a lower triangular matrix with elements gi,j below the diagonal and zeroes

above the diagonal:

G11...Gn11=[1g2,11gn,1...gn,n11].

We denote this product by L (for lower triangular). Thus,

A=LU,(12.10)

which is known as the LU decomposition of A. It is also called the triangular factorization of A. Every invertible matrix A has such a decomposition, although it may be necessary to employ pivoting.

Denote the elements of L by li,j (keeping in mind that li,j = 1) and those of U by ui,j. A simple 3 × 3 example will help illustrate the idea.

In this scheme, we are given the ai,j and we want the li,j and ui,j. This is systematically achieved using the following.

Observe that elements of A below the diagonal may be rewritten as

ai,j=li,1u1,j+...+li,j1uj1,j+li,juj,j;j<i,

For the elements of A that are on or above the diagonal, we get

ai,j=li,1u1,j+...+li,i1ui1,j+li,iui,j;ji.

This leads to

li,j=1uj,j(ai,jli,1u1,j...li,j1uj1,j);j<i(12.11)

and

ui,j=ai,jli,1u1,j...li,i1ui1.j;ji.(12.12)

If A has a decomposition A = LU, then the system can be written as

LUu=b.(12.13)

The matrix vector product Uu results in a vector; call this y. Reexamining (12.13), it becomes a two-step problem. First solve

Ly=b,(12.14)

then solve

Uu=y.(12.15)

If Uu = y, then LUu = Ly = b. The two systems in (12.14) and (12.15) are triangular and easy to solve. Forward substitution is applied to the matrix L. (See Exercise 21 and its solution for an algorithm.) Back substitution is applied to the matrix U. An algorithm is provided in Section 12.2.

A more direct method for forming L and U is achieved with (12.11) and (12.12), rather than through Gauss elimination. This then is the method of LU decomposition.

LU Decomposition

Given: A coefficient matrix A and a right-hand side b describing a linear system

Au=b.

Find: The unknowns u1,..., un.

Algorithm:

  • Initialize L as the identity matrix and U as the zero matrix.
  • Calculate the nonzero elements of L and U:
  • For k = 1, ..., n
    • uk,k=ak,klk,1u1,k...lk,k1uk1,k
    • For i = k + 1,..., n
      • li,k=1uk,k[ai,kli,1u1,k...li,k1uk1,k]
    • For j = k + 1 ,..., n
      • uk,j=ak,jlk,1u1,j...lk,k1uk1,j
  • Using forward substitution solve Ly = b.
  • Using back substitution solve Uu = y.

The uk,k term must not be zero; we had a similar situation with Gauss elimination. This situation either requires pivoting or the matrix might be singular.

The construction of the LU decomposition takes advantage of the triangular structure of L and U combined with a particular computation order. The matrix L is being filled column by column and the matrix U is being filled row by row.

Example 12.9

Let’s use LU decomposition to solve the linear system

A=[224123122]u=[111].

The first step is to decompose A. Following the steps in the algorithm above, we calculate the matrix entries:

k=1:u1,1=a1,1=2l2,1=a2,1u1,1=12l3,1=a3,1u1,1=12,u1,2=a1,2=2,u1,3=a1,3=4,

k=2:u2,2=a2,2l2,1u1,2=2+1=3,l3,2=1u2,2[a3,2l3,1u1,2]=13[21]=13,u2,3=a2,3l2,1u1,3=3+2=1,

k=3:u3,3=a3,3l3,1u1,3l3,2u2,3=22+13=13.

Check that this produces valid entries for L and U:

Next, we solve Ly = b with forward substitution—solving for y1, then y2, and then y3—and find that

y=[13/20].

The last step is to solve Uu = y with back substitution—as we did in Gauss elimination,

u=[01/20].

It is simple to check that this solution is correct since clearly, the column vector a2 is a multiple of b, and that is reflected in u.

Suppose A is nonsingular, but in need of pivoting. Then a permutation matrix P is used to exchange (possibly multiple) rows so it is possible to create the LU decomposition. The system is now PAu = Pb and we find PA = LU.

Finally, the major benefit of the LU decomposition: speed. For cases in which we have to solve multiple linear systems with the same coefficient matrix, LU decomposition is a big timesaver. We perform it once, and then perform the forward and backward substitutions (12.14) and (12.15) for each right-hand side. This is significantly less work than performing a complete Gauss elimination every time! Finding the inverse of a matrix, as described in (12.8), is an example of a problem that requires solving multiple linear systems with the same coefficient matrix.

12.6 Determinants

With the introduction of the scalar triple product, Section 8.5 provided a geometric derivation of 3 × 3 determinants; they measure volume. And then in Section 9.8 we learned more about determinants from the perspective of linear maps. Let’s revisit that approach for n × n determinants.

When we apply forward elimination to A, transforming it to upper triangular form U, we apply a sequence of shears and row exchanges. Shears do not change the volumes. As we learned in Section 9.8, a row exchange will change the sign of the determinant. Thus the column vectors of U span the same volume as did those of A, however the sign might change. This volume is given by the signed product of the diagonal entries of U and is called the determinant of A:

detA=(1)k(u1,1×...×un,n),(12.16)

where k is the number of row exchanges. In general, this is the best (and most stable) method for finding the determinant (but also see the method in Section 16.4).

Example 12.10

Let’s revisit Example 12.3 to illustrate how to calculate the determinant with the upper triangular form, and how row exchanges influence the sign of the determinant.

Use the technique of cofactor expansion, as defined by (9.14) to find the determinant of the given 3 × 3 matrix A:

detA=2|1211|2|1221|=4.

Now, apply (12.16) to the upper triangular form U (12.5) from the example, and notice that we did one row exchange, k = 1:

detA=(1)1[2×1×2]=4.

So the shears of Gauss elimination have not changed the absolute value of the determinant, and by modifying the sign based on the number of row exchanges, we can determine det A from det U.

The technique of cofactor expansion that was used for the 3 × 3 matrix in Example 12.10 may be generalized to n × n matrices. Choose any column or row of the matrix, for example entries a1,j as above, and then

detA=a1,1C1,1+a1,2C1,2+...+a1,nC1,n,

where each cofactor is defined as

Ci,j=(1)i+jMi,j,

and the Mi,j are called the minors; each is the determinant of the matrix with the ith row and jth column removed. The Mi,j are (n − 1) × (n − 1) determinants, and they are computed by yet another cofactor expansion. This process is repeated until we have 2 × 2 determinants. This technique is also known as expansion by minors.

Example 12.11

Let’s look at repeated application of cofactor expansion to find the determinant. Suppose we are given the following matrix,

A=[2204011300200005].

We may choose any row or column from which to form the cofactors, so in this example, we will have less work to do if we choose the first column. The cofactor expansion is

detA=2|113020005|=2(1)|2005|=2(1)(10)=20.

Since the matrix is in upper triangular form, we could use (12.16) and immediately see that this is in fact the correct determinant.

Cofactor expansion is more a theoretical tool than a computational one. This method of calculating the determinant plays an important theoretical role in the analysis of linear systems, and there are advanced theorems involving cofactor expansion and the inverse of a matrix. Computationally, Gauss elimination and the calculation of det U is superior.

In our first encounter with solving linear systems via Cramer’s rule in Section 5.3, we learned that the solution to a linear system may be found by simply forming quotients of areas. Now with our knowledge of n × n determinants, let’s revisit Cramer’s rule. If Au = b is an n × n linear system such that det A ≠ 0, then the system has the following unique solution:

u1=detA1detA,u2=detA2detA,...,un=detAndetA,(12.17)

where Ai is the matrix obtained by replacing the entries in the ith column by b. Cramer’s rule is an important theoretical tool; however, use it only for 2 × 2 or 3 × 3 linear systems.

Example 12.12

Let’s solve the linear system from Example 12.3 using Cramer’s rule. Following (12.17), we have

u1=|620912711||220112211|,u2=|260192271||220112211|,u3=|226119217||220112211|.

We have computed the determinant of the coefficient matrix A in Example 12.10, det A = 4. With the application of cofactor expansion for each numerator, we find that

u1=44=1,u2=82=2,u3=124=3,

which is identical to the solution found with Gauss elimination.

The determinant of a positive definite matrix is always positive, and therefore the matrix is always nonsingular. The upper-left submatrices of an n × n matrix A are

A1=[a1,1],A2=[a1,1a1,2a2,1a2,2],...,An=A.

If A is positive definite, then the determinants of all Ai are positive. Rules for working with determinants are given in Section 9.8.

12.7 Least Squares

When presented with large amounts of data, we often look for methods to create a simpler view or synopsis of the data. For example, Figure 12.3 is a graph of AIG’s monthly average stock price over twelve years. We see a lot of activity in the price, but there is a clear declining trend. A mathematical tool to capture this, which works when the trend is not as clear as it is here, is linear least squares approximation. The line illustrated in Figure 12.3 is the “best fit” line or best approximating line.

Figure 12.3

Figure showing least squares: fitting a straight line to stock price data for AIG from 2000 to 2013.

Least squares: fitting a straight line to stock price data for AIG from 2000 to 2013.

Linear least squares approximation is also useful when analyzing experimental data, which can be “noisy,” either from the data capture or observation method or from round-off from computations that generated the data. We might want to make summary statements about the data, estimate values where data is missing, or predict future values.

As a concrete (simple) example, suppose our experimental data are temperature (Celsius) over time (seconds):

[timetemperature][030][1025][2040][3040][4030][505][6025],

which are plotted in Figure 12.4. We want to establish a simple linear relationship between the variables,

temperature=a×time+b,

Figure 12.4

Figure showing least squares: a linear approximation to experimental data of time and temperature pairs.

Least squares: a linear approximation to experimental data of time and temperature pairs.

Writing down all relationships between knowns and unknowns, we obtain linear equations of the form

[01101201301401501601] [ab]=[3025404030525].(12.18)

We write the system as

Au=b,whereu=[ab].

This system of seven equations in two unknowns is overdetermined and in general it will not have solutions; it is inconsistent. After all, it is not very likely that b lives in the subspace V formed by the columns of A. (As an analogy, consider the likelihood of a randomly selected 3D vector living in the [e1, e2]-plane.) But there is a recipe for finding an approximate solution.

Denoting by b′ a vector in V, the system

Au=b(12.19)

is solvable (consistent), but it is still overdetermined since we have seven equations in two unknowns. Recall from Section 2.8 that we can write b as the sum of its orthogonal projection into V and the component of b orthogonal to V,

b=b+b.(12.20)

Also recall that b′ is closest to b and in V. Sketch 12.3 illustrates this idea in 3D.

Sketch 12.3

Sketch showing least squares.

Least squares.

Since b is orthogonal to V, we can use matrix notation to formalize this relationship,

a1Tb=0anda2Tb=0,

which is equivalent to

ATb=0.

Based on (12.20), we can substitute bb′ for b,

AT(bb)=0AT(bAu)=0ATbATAu=0.

Rearranging this last equation, we have the normal equations

ATAu=ATb.(12.21)

This is a linear system with a square matrix ATA! Even more, that matrix is symmetric. The solution to the new system (12.21), when it has one, is the one that minimizes the error

||Aub||2,

and for this reason, it is called the least squares solution of the original system. Recall that b′ is closest to b in V and since we solved (12.19), we have in effect minimized ||b′b||.

It seems pretty amazing that by simply multiplying both sides by AT, we have a “best” solution to the original problem!

Example 12.13

Returning to the system in (12.18), we form the normal equations,

[91002102107] [ab]=[5200195].

(Notice that the matrix is symmetric.)

The least squares solution is the solution of this linear system,

[ab]=[0.2334.8],

which corresponds to the line x2 = −0.23x1 + 34.8. Figure 12.4 illustrates this line with negative slope and e2 intercept of 34.8.

Imagine a scenario where our data capture method failed due to some environmental condition. We might want to remove data points if they seem outside the norm. These are called outliers. Point six in Figure 12.4 looks to be an outlier. The least squares approximating line provides a means for determining that this point is something of an exception.

Linear least squares approximation can also serve as a method for data compression.

Numerical problems can creep into the normal equations of the linear system (12.21). This is particularly so when the n × m matrix A has many more equations than unknowns, nm. In Section 13.1, we will examine the Householder method for finding the least squares solution to the linear system Au = b directly, without forming the normal equations. Example 12.13 will be revisited in Example 13.3. And yet another look at the least squares solution is possible with the singular value decomposition of A in Section 16.6.

12.8 Application: Fitting Data to a Femoral Head

In prosthetic surgery, a common task is that of hip bone replacement. This involves removing an existing femoral head and replacing it by a transplant, consisting of a new head and a shaft for attaching it into the existing femur. The transplant is typically made from titanium or ceramic; the part that is critical for perfect fit and thus function is the spherical head as shown in Figure 12.5. Data points are collected from the existing femoral head by means of MRI or PET scans, then a spherical fit is obtained, and finally the transplant is manufactured. The fitting process is explained next.

Figure 12.5

Figure showing femur transplant: left, a titanium femoral head with shaft. Right, an example of a sphere fit. Black points are “in front,” gray points are occluded.

Femur transplant: left, a titanium femoral head with shaft. Right, an example of a sphere fit. Black points are “in front,” gray points are occluded.

We are given a set of 3D vectors v1,..., vL that are of approximately equal length, ρ1,...,ρL. We would like to find a sphere (centered at the origin) with radius r that closely fits the vi.

If all vi were on that sphere, we would have

r=ρ1(12.22)

(12.23)

r=ρL.(12.24)

This is a very overdetermined linear system—L equations in only one unknown, r!

In matrix form:

[11] [r]=[ρ1ρL].

Be sure to verify that the matrix dimensions work out!

Multiplying both sides by [1 ... 1] gives

Lr=ρ1+...+ρL

with the final result

r=ρ1+...+ρLL.

Thus the least squares solution is simply the average of the given radii—just as our intuition would have suggested in the first place. Things are not that simple if it comes to more unknowns, see Section 16.6.

image

  • n × n linear system
  • coefficient matrix
  • consistent system
  • subspace
  • solvable system
  • unsolvable system
  • Gauss elimination
  • upper triangular matrix
  • forward elimination
  • back substitution
  • elementary row operation
  • permutation matrix
  • row echelon form
  • pivoting
  • Gauss matrix
  • multiplier
  • augmented matrix
  • singular matrix
  • matrix rank
  • full rank
  • rank deficient
  • homogeneous linear system
  • inverse matrix
  • LU decomposition
  • factorization
  • forward substitution
  • lower triangular matrix
  • determinant
  • cofactor expansion
  • expansion by minors
  • Cramer’s rule
  • overdetermined system
  • least squares solution
  • normal equations

12.9 Exercises

  1. Does the linear system

    [120000121]u=[123]

    have a unique solution? Is it consistent?

  2. Does the linear system

    [115111127]u=[333]

    have a unique solution? Is it consistent?

  3. Examine the linear system in Example 12.1. What restriction on the ti is required to guarantee a unique solution?
  4. Solve the linear system Av = b where

    A=[1012001220011110],andb=[1213].

    Show all the steps from the Gauss elimination algorithm.

  5. Solve the linear system Av = b where

    A=[001100111],andb=[101].

    Show all the steps from the Gauss elimination algorithm.

  6. Solve the linear system Av = b where

    A=[421220423],andb=[729].

  7. Transform the following linear system to row echelon form.

    [320312020]u=[111].

  8. What is the rank of the following matrix?

    [3201000101200001]

  9. What is the permutation matrix that will exchange rows 3 and 4 in a 5 × 5 matrix?
  10. What is the permutation matrix that will exchange rows 2 and 4 in a 4 × 4 matrix?
  11. What is the matrix G as defined in (12.4) for Example 12.2?
  12. What is the matrix G as defined in (12.4) for Exercise 6?
  13. Solve the linear system

    [412211211]u=[000]

    with Gauss elimination with pivoting.

  14. Solve the linear system

    [36161229183]u=[000]

    with Gauss elimination with pivoting.

  15. Find the inverse of the matrix from Exercise 5.
  16. Find the inverse of

    [321021021].

  17. Find the inverse of

    [cosθ0sinθ010sinθ0cosθ].

  18. Find the inverse of

    [5000004000003000002000001].

  19. Find the inverse of

    [30211432].

  20. Calculate the LU decomposition of the matrix

    A=[301120111].

  21. Write a forward substitution algorithm for solving the lower triangular system (12.14).
  22. Use the LU decomposition of A from Exercise 20 to solve the linear system Au = b, where

    b=[404].

  23. Calculate the determinant of

    A=[301120111].

  24. What is the rank of the matrix in Exercise 23?
  25. Calculate the determinant of the matrix

    A=[2436100021011120]

    using expansion by minors. Show all steps.

  26. Apply Cramer’s rule to solve the following linear system:

    [301120111]u=[866].

    Hint: Reuse your work from Exercise 23.

  27. Apply Cramer’s rule to solve the following linear system:

    [301020021]u=[647].

  28. Set up and solve the linear system for solving the intersection of the three planes,

    x1+x3=1,x3=1,x2=2.

  29. Find the intersection of the plane

    x(u1,u2)=[001]+u1[101]u2[011]

    and the line

    p(t)=[112]+t[001]

    by setting up the problem as a linear system and solving it.

  30. Let five points be given by

    p1=[22],p2=[11],p3=[00],p4=[11],p5=[22].

    Find the linear least squares approximation.

  31. Let five points be given by

    p1=[00],p2=[11],p3=[22],p4=[11],p5=[22].

    Find the linear least squares approximation.

  32. Let two points be given by

    p1=[41],p2=[43].

    Find the linear least squares approximation.

1Read this loosely as: an estimated number of n3 operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.160.43