Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12

Gauss for Linear Systems

In Chapter 5, we studied linear systems of two equations in two unknowns. A whole chapter for such a humble task seems like a bit of overkill—its main purpose was really to lay the groundwork for this chapter.

Linear systems arise in virtually every area of science and engineering—some are as big as 1,000,000 equations in as many unknowns. Such huge systems require more sophisticated treatment than the methods introduced here. They will allow you to solve systems with several thousand equations without a problem.

Figure 12.1 illustrates the use of linear systems in the field of data smoothing. The left triangulation looks somewhat “rough”; after setting up an appropriate linear system, we compute the “smoother” triangulation on the right, in which the triangles are closer to being equilateral.

Figure 12.1

Figure showing linear systems: the triangulation on the right was obtained from the left one by solving a linear system.

Linear systems: the triangulation on the right was obtained from the left one by solving a linear system.

This chapter explains the basic ideas underlying linear systems. Readers eager for hands-on experience should get access to software such as Mathematica or MATLAB. Readers with advanced programming knowledge can download linear system solvers from the web. The most prominent collection of routines is LAPACK.

12.1 The Problem

A linear system is a set of equations like this:

$\begin{array}{r} 3 u_{1} - 2 u_{2} - 10 u_{3} + u_{4} & = 0 \\ u_{1} - u_{3} & = 4 \\ u_{1} + u_{2} - 2 u_{3} + 3 u_{4} & = 1 \\ u_{2} + 2 u_{4} & = - 4. \end{array}$ $\begin{array}{r} 3 u_{1} - 2 u_{2} - 10 u_{3} + u_{4} & = 0 \\ u_{1} - u_{3} & = 4 \\ u_{1} + u_{2} - 2 u_{3} + 3 u_{4} & = 1 \\ u_{2} + 2 u_{4} & = - 4. \end{array}$

The unknowns are the numbers u1, u2, u3, u4. There are as many equations as there are unknowns, four in this example. We rewrite this 4 × 4 linear system in matrix form:

$[\begin{array}{r} 3 & - 2 & - 10 & 1 \\ 1 & 0 & - 1 & 0 \\ 1 & 1 & - 2 & 3 \\ 0 & 1 & 0 & 2 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \\ u_{4} \end{matrix}] = [\begin{array}{r} 0 \\ 4 \\ 1 \\ - 4 \end{array}] .$ $[\begin{array}{r} 3 & - 2 & - 10 & 1 \\ 1 & 0 & - 1 & 0 \\ 1 & 1 & - 2 & 3 \\ 0 & 1 & 0 & 2 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \\ u_{4} \end{matrix}] = [\begin{array}{r} 0 \\ 4 \\ 1 \\ - 4 \end{array}] .$

A general n × n linear system looks like this:

$\begin{matrix} a_{1, 1} u_{1} + a_{1, 2} u_{2} + ... + a_{1, n} u_{n} & = b_{1} \\ a_{2, 1} u_{1} + a_{2, 2} u_{2} + ... + a_{2, n} u_{n} & = b_{2} \\ ⋮ \\ a_{n, 1} u_{1} + a_{n, 2} u_{2} + ... + a_{n, n} u_{n} & = b_{n} . \end{matrix}$ $\begin{matrix} a_{1, 1} u_{1} + a_{1, 2} u_{2} + ... + a_{1, n} u_{n} & = b_{1} \\ a_{2, 1} u_{1} + a_{2, 2} u_{2} + ... + a_{2, n} u_{n} & = b_{2} \\ ⋮ \\ a_{n, 1} u_{1} + a_{n, 2} u_{2} + ... + a_{n, n} u_{n} & = b_{n} . \end{matrix}$

In matrix form, it becomes

$\begin{matrix} [\begin{matrix} a_{1, 1} & a_{1, 2} & ... & a_{1, n} \\ a_{2, 1} & a_{2, 2} & ... & a_{2, n} \\ ⋮ \\ a_{n, 1} & a_{n, 2} & ... & a_{n, n} \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ ⋮ \\ u_{n} \end{matrix}] = [\begin{matrix} b_{1} \\ b_{2} \\ ⋮ \\ b_{n} \end{matrix}], & (12.1) \end{matrix}$ $\begin{matrix} [\begin{matrix} a_{1, 1} & a_{1, 2} & ... & a_{1, n} \\ a_{2, 1} & a_{2, 2} & ... & a_{2, n} \\ ⋮ \\ a_{n, 1} & a_{n, 2} & ... & a_{n, n} \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ ⋮ \\ u_{n} \end{matrix}] = [\begin{matrix} b_{1} \\ b_{2} \\ ⋮ \\ b_{n} \end{matrix}], & (12.1) \end{matrix}$

$[\begin{matrix} a_{1} & a_{2} & ... & a_{n} \end{matrix}] u = b,$ $[\begin{matrix} a_{1} & a_{2} & ... & a_{n} \end{matrix}] u = b,$

or even shorter

$A u = b .$ $A u = b .$

The coefficient matrix A has n rows and n columns. For example, the first row is

$a_{1, 1}, a_{1, 2}, ..., a_{1, n},$ $a_{1, 1}, a_{1, 2}, ..., a_{1, n},$

and the second column is

$\begin{matrix} a_{1, 2} \\ a_{2, 2} \\ ⋮ \\ a_{n, 2} . \end{matrix}$ $\begin{matrix} a_{1, 2} \\ a_{2, 2} \\ ⋮ \\ a_{n, 2} . \end{matrix}$

Equation (12.1) is a compact way of writing n equations for the n unknowns u1, ..., un. In the 2 × 2 case, such systems had nice geometric interpretations; in the general case, that interpretation needs n-dimensional linear spaces, and is not very intuitive. Still, the methods that we developed for the 2 × 2 case can be gainfully employed here!

Some underlying principles with a geometric interpretation are best explained for the example n = 3. We are given a vector b and we try to write it as a linear combination of vectors a1, a2, a3,

$[\begin{matrix} a_{1} & a_{2} & a_{3} \end{matrix}] u = b .$ $[\begin{matrix} a_{1} & a_{2} & a_{3} \end{matrix}] u = b .$

If the ai are truly 3D, i.e., if they form a tetrahedron, then a unique solution may be found (see Sketch 12.1). But if the three ai all lie in a plane (i.e., if the volume formed by them is zero), then you cannot write b as a linear combination of them, unless it is itself in that 2D plane. In this case, you cannot expect uniqueness for your answer. Sketch 12.2 covers these cases. In general, a linear system is uniquely solvable if the ai have a nonzero n-dimensional volume. If they do not, they span a k-dimensional subspace (with k < n)—nonunique solutions exist only if b is itself in that subspace. A linear system is called consistent if at least one solution exists.

Sketch 12.1

Sketch showing a solvable 3 × 3 system.

A solvable 3 × 3 system.

Sketch 12.2

Sketch showing top: no solution, bottom: nonunique solution.

Top: no solution, bottom: nonunique solution.

For 2D and 3D, we encountered many problems that lent themselves to constructing the linear system in terms of a linear combination of column vectors, the ai. However, in Section 5.11 we looked at how a linear system can be interpreted as a problem using equations built row by row. In n-dimensions, this commonly occurs. An example follows.

Example 12.1

Suppose that at five time instances, say ti = 0, 0.25, 0.5, 0.75, 1 seconds, we have associated observation data, p(ti) = 0, 1, 0.5, 0.5, 0. We would like to fit a polynomial to these data so we can estimate values in between the observations. This is called polynomial interpolation. Five points require a degree four polynomial,

$p (t) = c_{0} + c_{1} t + c_{2} t^{2} + c_{3} t^{3} + c_{4} t^{4},$ $p (t) = c_{0} + c_{1} t + c_{2} t^{2} + c_{3} t^{3} + c_{4} t^{4},$

which has five coefficients ci. Immediately we see that we can write down five equations,

$\begin{matrix} p (t_{i}) = c_{0} + c_{1} t_{i} + c_{2} t_{i}^{2} + c_{3} t_{i}^{3} + c_{4} t_{i}^{4}, & i = 0, ..., 4, \end{matrix}$ $\begin{matrix} p (t_{i}) = c_{0} + c_{1} t_{i} + c_{2} t_{i}^{2} + c_{3} t_{i}^{3} + c_{4} t_{i}^{4}, & i = 0, ..., 4, \end{matrix}$

or in matrix form,

$[\begin{matrix} 1 & t_{0} & t_{0}^{2} & t_{0}^{3} & t_{0}^{4} \\ 1 & t_{1} & t_{1}^{2} & t_{1}^{3} & t_{1}^{4} \\ ⋮ \\ 1 & t_{4} & t_{4}^{2} & t_{4}^{3} & t_{4}^{4} \end{matrix}] [\begin{matrix} c_{0} \\ c_{1} \\ ⋮ \\ c_{4} \end{matrix}] = [\begin{matrix} p (t_{0}) \\ p (t_{1}) \\ ⋮ \\ p (t_{4}) \end{matrix}]$ $[\begin{matrix} 1 & t_{0} & t_{0}^{2} & t_{0}^{3} & t_{0}^{4} \\ 1 & t_{1} & t_{1}^{2} & t_{1}^{3} & t_{1}^{4} \\ ⋮ \\ 1 & t_{4} & t_{4}^{2} & t_{4}^{3} & t_{4}^{4} \end{matrix}] [\begin{matrix} c_{0} \\ c_{1} \\ ⋮ \\ c_{4} \end{matrix}] = [\begin{matrix} p (t_{0}) \\ p (t_{1}) \\ ⋮ \\ p (t_{4}) \end{matrix}]$

Figure 12.2 illustrates the result, p(t) = 12.667t − 50t2 + 69.33t3 − 32t4, for the given data.

Figure 12.2

Figure showing polynomial interpolation: a degree four polynomial fit to five data points.

Polynomial interpolation: a degree four polynomial fit to five data points.

12.2 The Solution via Gauss Elimination

The key to success in the 2 × 2 case was the application of a shear (forward elimination) so that the matrix A was transformed to upper triangular, meaning all entries below the diagonal are zero. Then it was possible to apply back substitution to solve for the unknowns. A shear was constructed to map the first column vector of the matrix onto the e1-axis. Revisiting an example from Chapter 5, we set

$\begin{matrix} [\begin{matrix} 2 & 4 \\ 1 & 6 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] = [\begin{matrix} 4 \\ 4 \end{matrix}] . & (12.2) \end{matrix}$ $\begin{matrix} [\begin{matrix} 2 & 4 \\ 1 & 6 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] = [\begin{matrix} 4 \\ 4 \end{matrix}] . & (12.2) \end{matrix}$

The shear used was

$S_{1} = [\begin{matrix} 1 & 0 \\ - 1 / 2 & 1 \end{matrix}],$ $S_{1} = [\begin{matrix} 1 & 0 \\ - 1 / 2 & 1 \end{matrix}],$

which when applied to the system as

$S_{1} A u = S_{1} b$ $S_{1} A u = S_{1} b$

produced the system

$[\begin{matrix} 2 & 4 \\ 0 & 4 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] = [\begin{matrix} 4 \\ 2 \end{matrix}] .$ $[\begin{matrix} 2 & 4 \\ 0 & 4 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] = [\begin{matrix} 4 \\ 2 \end{matrix}] .$

Algebraically, what this shear did was to change the rows of the system in the following manner:

$\begin{matrix} {row}_{1} \leftarrow {row}_{1} & and & {row}_{2} \leftarrow {row}_{2} - \frac{1}{2} {row}_{1} . \end{matrix}$ $\begin{matrix} {row}_{1} \leftarrow {row}_{1} & and & {row}_{2} \leftarrow {row}_{2} - \frac{1}{2} {row}_{1} . \end{matrix}$

Each of these constitutes an elementary row operation. Back substitution came next, with

$u_{2} = \frac{1}{4} \times 2 = \frac{1}{2}$ $u_{2} = \frac{1}{4} \times 2 = \frac{1}{2}$

and then

$u_{1} = \frac{1}{2} (4 - 4 u_{2}) = 1.$ $u_{1} = \frac{1}{2} (4 - 4 u_{2}) = 1.$

The divisions in the back substitution equations are actually scalings, thus they could be rewritten in terms of a scale matrix:

$S_{2} = [\begin{matrix} 1 / 2 & 0 \\ 0 & 1 / 4 \end{matrix}],$ $S_{2} = [\begin{matrix} 1 / 2 & 0 \\ 0 & 1 / 4 \end{matrix}],$

and then the system would be transformed via

$S_{2} S_{1} A u = S_{2} S_{1} b .$ $S_{2} S_{1} A u = S_{2} S_{1} b .$

The corresponding upper triangular matrix and system is

$[\begin{matrix} 1 & 2 \\ 0 & 1 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] = [\begin{matrix} 2 \\ 1 / 2 \end{matrix}] .$ $[\begin{matrix} 1 & 2 \\ 0 & 1 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] = [\begin{matrix} 2 \\ 1 / 2 \end{matrix}] .$

Check for yourself that we get the same result.

Thus, we see the geometric steps for solving a linear system have methodical algebraic interpretations. We will be following this algebraic approach for the rest of the chapter. For general linear systems, the matrices, such as S1 and S2 above, are not actually constructed due to speed and storage expense. Notice that the shear to zero one element in the matrix, changed the elements in only one row, thus it is unnecessary to manipulate the other rows. This is an important observation for large systems.

In the general case, just as in the 2 × 2 case, pivoting will be used. Recall for the 2 × 2 case this meant that the equations were reordered such that the (pivot) matrix element a1,1 is the largest one in the first column. A row exchange can be represented in terms of a permutation matrix. Suppose the 2 × 2 system in (12.2) was instead,

$[\begin{matrix} 1 & 6 \\ 2 & 4 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] = [\begin{matrix} 4 \\ 4 \end{matrix}],$ $[\begin{matrix} 1 & 6 \\ 2 & 4 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] = [\begin{matrix} 4 \\ 4 \end{matrix}],$

requiring pivoting as the first step. The permutation matrix that will exchange the two rows is

$P_{1} = [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}],$ $P_{1} = [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}],$

which is the identity matrix with the rows (columns) exchanged. After this row exchange, the steps for solving P1Au = P1b are the same as for the system in (12.2): S2S1P1Au = S2S1P1b.

Example 12.2

Let’s step through the necessary row exchanges and shears for a 3 × 3 linear system. The goal is to get it in upper triangular form so we may use back substitution to solve for the unknowns. The system is

$[\begin{array}{r} 2 & - 2 & 0 \\ 4 & 0 & - 2 \\ 4 & 2 & - 4 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} 4 \\ - 2 \\ 0 \end{array}] .$ $[\begin{array}{r} 2 & - 2 & 0 \\ 4 & 0 & - 2 \\ 4 & 2 & - 4 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} 4 \\ - 2 \\ 0 \end{array}] .$

The matrix element a1, 1 is not the largest in the first column, so we choose the 4 in the second row to be the pivot element and we reorder the rows:

$[\begin{array}{r} 4 & 0 & - 2 \\ 2 & - 2 & 0 \\ 4 & 2 & - 4 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} - 2 \\ 4 \\ 0 \end{array}] .$ $[\begin{array}{r} 4 & 0 & - 2 \\ 2 & - 2 & 0 \\ 4 & 2 & - 4 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} - 2 \\ 4 \\ 0 \end{array}] .$

The permutation matrix that achieves this row exchange is

$P_{1} = [\begin{matrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}] .$ $P_{1} = [\begin{matrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}] .$

(The subscript 1 indicates that this matrix is designed to achieve the appropriate exchange for the first column.)

To zero entries in the first column apply:

$\begin{matrix} {row}_{2} \leftarrow {row}_{2} - \frac{1}{2} {row}_{1} \\ {row}_{3} \leftarrow {row}_{3} - {row}_{1}, \end{matrix}$ $\begin{matrix} {row}_{2} \leftarrow {row}_{2} - \frac{1}{2} {row}_{1} \\ {row}_{3} \leftarrow {row}_{3} - {row}_{1}, \end{matrix}$

and the system becomes

$[\begin{array}{r} 4 & 0 & - 2 \\ 0 & - 2 & 1 \\ 0 & 2 & - 2 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} - 2 \\ 5 \\ 2 \end{array}] .$ $[\begin{array}{r} 4 & 0 & - 2 \\ 0 & - 2 & 1 \\ 0 & 2 & - 2 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} - 2 \\ 5 \\ 2 \end{array}] .$

The shear matrix that achieves this is

$G_{1} = [\begin{matrix} 1 & 0 & 0 \\ - 1 / 2 & 1 & 0 \\ - 1 & 0 & 1 \end{matrix}] .$ $G_{1} = [\begin{matrix} 1 & 0 & 0 \\ - 1 / 2 & 1 & 0 \\ - 1 & 0 & 1 \end{matrix}] .$

Now the first column consists of only zeroes except for a1,1, meaning that it is lined up with the e1-axis.

Now work on the second column vector. First, check if pivoting is necessary; this means checking that a2,2 is the largest in absolute value of all values in the second column that are below the diagonal. No pivoting is necessary. (We could say that the permutation matrix P2 = I.) To zero the last element in this vector apply

${row}_{3} \leftarrow {row}_{3} + {row}_{2},$ ${row}_{3} \leftarrow {row}_{3} + {row}_{2},$

which produces

$[\begin{array}{r} 4 & 0 & - 2 \\ 0 & - 2 & 1 \\ 0 & 0 & - 1 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} - 2 \\ 5 \\ 7 \end{array}] .$ $[\begin{array}{r} 4 & 0 & - 2 \\ 0 & - 2 & 1 \\ 0 & 0 & - 1 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} - 2 \\ 5 \\ 7 \end{array}] .$

The shear matrix that achieves this is

$G_{2} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 1 \end{matrix}] .$ $G_{2} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 1 \end{matrix}] .$

By chance, the second column is aligned with e2 because a1,2 = 0. If this extra zero had not appeared, then the last operation would have mapped this 3D vector into the [e1, e2]-plane.

Now that we have the matrix in upper triangular form, we are ready for back substitution:

$\begin{array}{l} u_{3} = \frac{1}{- 1} (7) \\ u_{2} = \frac{1}{- 2} (5 - u_{3}) \\ u_{1} = \frac{1}{4} (- 2 + 2 u_{3}) . \end{array}$ $\begin{array}{l} u_{3} = \frac{1}{- 1} (7) \\ u_{2} = \frac{1}{- 2} (5 - u_{3}) \\ u_{1} = \frac{1}{4} (- 2 + 2 u_{3}) . \end{array}$

This implicitly incorporates a scaling matrix. We obtain the solution

$[\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} - 4 \\ - 6 \\ - 7 \end{array}] .$ $[\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} - 4 \\ - 6 \\ - 7 \end{array}] .$

It is usually a good idea to insert the solution into the original equations:

$[\begin{array}{r} 2 & - 2 & 0 \\ 4 & 0 & - 2 \\ 4 & 2 & - 4 \end{array}] [\begin{array}{r} - 4 \\ - 6 \\ - 7 \end{array}] = [\begin{array}{r} 4 \\ - 2 \\ 0 \end{array}] .$ $[\begin{array}{r} 2 & - 2 & 0 \\ 4 & 0 & - 2 \\ 4 & 2 & - 4 \end{array}] [\begin{array}{r} - 4 \\ - 6 \\ - 7 \end{array}] = [\begin{array}{r} 4 \\ - 2 \\ 0 \end{array}] .$

It works!

The example above illustrates each of the elementary row operations that take place during Gauss elimination:

Pivoting results in the exchange of two rows.
Shears result in adding a multiple of one row to another.
Scaling results in multiplying a row by a scalar.

Gauss elimination for solving a linear system consists of two basic steps: forward elimination (pivoting and shears) and then back substitution (scaling). Here is the algorithm for solving a general n × n system of linear equations.

Gauss Elimination with Pivoting

Given: An n × n coefficient matrix A and a n × 1 right-hand side b describing a linear system

$A u = b,$ $A u = b,$

which is short for the more detailed (12.1).

Find: The unknowns $u_{1}, ..., u_{n}$ $u_{1}, ..., u_{n}$ .

Algorithm:

Initialize the n × n matrix G = I.
For (j counts columns)
- Pivoting step:
- Find the element with the largest absolute value in column j from aj,j to an,j; this is element ar,j.
  - If r > j, exchange equations r and j.
- If aj,j = 0, the system is not solvable.
- Forward elimination step for column j:
- For i = j + 1,...,n (elements below diagonal of column j)
  - Construct the multiplier gi,j = ai,j/aj,j
  - ai,j = 0
  - For k = j + 1, ..., n (each element in row i after column j)
    - ai,k = ai,k − gi,jaj,k
  - bi = bi − gi,jbj
At this point, all elements below the diagonal have been set to zero. The matrix is now in upper triangular form.
- Back substitution:
- $u_{n} = b_{n} / a_{n, n}$ $u_{n} = b_{n} / a_{n, n}$
- For
  - $u_{j} = \frac{1}{a_{j, j}} [b_{j} - a_{j, j + 1} u_{j + 1} - ... - a_{j, n} u_{n}]$ $u_{j} = \frac{1}{a_{j, j}} [b_{j} - a_{j, j + 1} u_{j + 1} - ... - a_{j, n} u_{n}]$ .

In a programming environment, it can be convenient to form an augmented matrix, which is the matrix A augmented with the vector b. Here is the idea for a 3 × 3 linear system:

$[\begin{matrix} a_{1, 1} & a_{1, 2} & a_{1, 3} & b_{1} \\ a_{2, 1} & a_{2, 2} & a_{2, 3} & b_{2} \\ a_{3, 1} & a_{3, 2} & a_{3, 3} & b_{3} \end{matrix}] .$ $[\begin{matrix} a_{1, 1} & a_{1, 2} & a_{1, 3} & b_{1} \\ a_{2, 1} & a_{2, 2} & a_{2, 3} & b_{2} \\ a_{3, 1} & a_{3, 2} & a_{3, 3} & b_{3} \end{matrix}] .$

Then the k steps would run to n + 1, and there would be no need for the extra line for the bi element.

As demonstrated in Example 12.2, the operations in the elimination step above may be written in matrix form. If A is the current matrix, then at step j, we first check if a row exchange is necessary. This may be achieved with a permutation matrix, Pj. If no pivoting is necessary, Pj = I. To produce zeroes under aj,j the matrix product Gj A is formed, where

$G_{j} = \begin{matrix} [\begin{matrix} 1 \\ ⋱ \\ 1 \\ 1 \\ - g_{j + 1, j} & 1 \\ ⋮ & ⋱ \\ - g_{n, j} & 1 \end{matrix}] . & (12.3) \end{matrix}$ $G_{j} = \begin{matrix} [\begin{matrix} 1 \\ ⋱ \\ 1 \\ 1 \\ - g_{j + 1, j} & 1 \\ ⋮ & ⋱ \\ - g_{n, j} & 1 \end{matrix}] . & (12.3) \end{matrix}$

The elements −gi,j of Gj are the multipliers. The matrix Gj is called a Gauss matrix. All entries except for the diagonal and the entries −gi,j below the diagonal of the jth column are zero. We could store all Pj and Gj in one matrix

$\begin{matrix} G = G_{n - 1} P_{n - 1} \cdot ... \cdot G_{2} \cdot P_{2} \cdot G_{1} \cdot P_{1} . & (12.4) \end{matrix}$ $\begin{matrix} G = G_{n - 1} P_{n - 1} \cdot ... \cdot G_{2} \cdot P_{2} \cdot G_{1} \cdot P_{1} . & (12.4) \end{matrix}$

If no pivoting is required, then it is possible to store the gi,j in the zero elements of A rather than explicitly setting the ai,j element equal to zero. Regardless, it is more efficient with regard to speed and storage not to multiply A and b by the permutation and Gauss matrices because this would result in many unnecessary calculations.

To summarize, Gauss elimination with pivoting transforms the linear system Au = b into the system

$G A u = G b,$ $G A u = G b,$

which has the same solution as the original system. The matrix GA is upper triangular, and it is referred to as U. The diagonal elements of U are the pivots.

Example 12.3

We look at another example, taken from [2]. Let the system be given by

$[\begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{matrix} 6 \\ 9 \\ 7 \end{matrix}] .$ $[\begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{matrix} 6 \\ 9 \\ 7 \end{matrix}] .$

We start the algorithm with j = 1, and observe that no element in column 1 exceeds a1,1 in absolute value, so no pivoting is necessary at this step, thus P1 = I. Proceed with the elimination step for row 2 by constructing the multiplier

$g_{2, 1} = a_{2, 1} / a_{1, 1} = 1 / 2.$ $g_{2, 1} = a_{2, 1} / a_{1, 1} = 1 / 2.$

Change row 2 as follows:

${row}_{2} \leftarrow {row}_{2} - \frac{1}{2} {row}_{1} .$ ${row}_{2} \leftarrow {row}_{2} - \frac{1}{2} {row}_{1} .$

Remember, this includes changing the element b2. Similarly for row 3,

$g_{3, 1} = a_{3, 1} / a_{1, 1} = 2 / 2 = 1$ $g_{3, 1} = a_{3, 1} / a_{1, 1} = 2 / 2 = 1$

then

${row}_{3} \leftarrow {row}_{3} - {row}_{1} .$ ${row}_{3} \leftarrow {row}_{3} - {row}_{1} .$

Step j = 1 is complete and the linear system is now

$[\begin{matrix} 2 & 2 & 0 \\ 0 & 0 & 2 \\ 0 & - 1 & 1 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{matrix} 6 \\ 6 \\ 1 \end{matrix}] .$ $[\begin{matrix} 2 & 2 & 0 \\ 0 & 0 & 2 \\ 0 & - 1 & 1 \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{matrix} 6 \\ 6 \\ 1 \end{matrix}] .$

The Gauss matrix for j = 1,

$G_{1} = [\begin{array}{r} 1 & 0 & 0 \\ - 1 / 2 & 1 & 0 \\ - 1 & 0 & 1 \end{array}] .$ $G_{1} = [\begin{array}{r} 1 & 0 & 0 \\ - 1 / 2 & 1 & 0 \\ - 1 & 0 & 1 \end{array}] .$

Next is column 2, so j = 2. Observe that a2,2 = 0, whereas a3,2 = −1. We exchange equations 2 and 3 and the system becomes

$\begin{matrix} [\begin{array}{r} 2 & 2 & 0 \\ 0 & - 1 & 1 \\ 0 & 0 & 2 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{matrix} 6 \\ 1 \\ 6 \end{matrix}] . & (12.5) \end{matrix}$ $\begin{matrix} [\begin{array}{r} 2 & 2 & 0 \\ 0 & - 1 & 1 \\ 0 & 0 & 2 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{matrix} 6 \\ 1 \\ 6 \end{matrix}] . & (12.5) \end{matrix}$

The permutation matrix for this exchange is

$P_{2} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{matrix}] .$ $P_{2} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{matrix}] .$

If blindly following the algorithm above, we would proceed with the elimination for row 3 by forming the multiplier

$g_{3, 2} = \frac{a_{3, 2}}{a_{2, 2}} = \frac{0}{- 1} = 0.$ $g_{3, 2} = \frac{a_{3, 2}}{a_{2, 2}} = \frac{0}{- 1} = 0.$

Then operate on the third row

${row}_{3} \leftarrow {row}_{3} - 0 \times {row}_{2},$ ${row}_{3} \leftarrow {row}_{3} - 0 \times {row}_{2},$

which doesn’t change the row at all. (So we will record G2 = I.) Due to numerical instabilities, g3,2 might not be exactly zero. Without putting a special check for a zero multiplier, this unnecessary work takes place. Tolerances are very important here.

Apply back substitution by first solving for the last unknown:

$u_{3} = 3.$ $u_{3} = 3.$

Start the back substitution loop with j = 2:

$u_{2} = \frac{1}{- 1} [1 - u_{3}] = 2,$ $u_{2} = \frac{1}{- 1} [1 - u_{3}] = 2,$

and finally

$u_{1} = \frac{1}{2} [6 - 2 u_{2}] = 1.$ $u_{1} = \frac{1}{2} [6 - 2 u_{2}] = 1.$

It’s a good idea to check the solution:

$[\begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix}] [\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}] = [\begin{matrix} 6 \\ 9 \\ 7 \end{matrix}] .$ $[\begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix}] [\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}] = [\begin{matrix} 6 \\ 9 \\ 7 \end{matrix}] .$

The final matrix G = G2P2G1P1 is

$G = [\begin{matrix} 1 & 0 & 0 \\ - 1 & 0 & 1 \\ - 1 / 2 & 1 & 0 \end{matrix}] .$ $G = [\begin{matrix} 1 & 0 & 0 \\ - 1 & 0 & 1 \\ - 1 / 2 & 1 & 0 \end{matrix}] .$

Check that GAu = Gb results in the linear system in (12.5).

Just before back substitution, we could scale to achieve ones along the diagonal of the matrix. Let’s do precisely that to the linear system in Example 12.3. Multiply both sides of (12.5) by

$[\begin{matrix} 1 / 2 & 0 & 0 \\ 0 & - 1 & 0 \\ 0 & 0 & 1 / 2 \end{matrix}] .$ $[\begin{matrix} 1 / 2 & 0 & 0 \\ 0 & - 1 & 0 \\ 0 & 0 & 1 / 2 \end{matrix}] .$

This transforms the linear system to

$[\begin{array}{r} 1 & 1 & 0 \\ 0 & 1 & - 1 \\ 0 & 0 & 1 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} 3 \\ - 1 \\ 3 \end{array}] .$ $[\begin{array}{r} 1 & 1 & 0 \\ 0 & 1 & - 1 \\ 0 & 0 & 1 \end{array}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}] = [\begin{array}{r} 3 \\ - 1 \\ 3 \end{array}] .$

This upper triangular matrix with rank = n is said to be in row echelon form. If the matrix is rank deficient, rank < n, then the rows with all zeroes should be the last rows. Some definitions of row echelon do not require ones along the diagonal, as we have here; it is more efficient to do the scaling as part of back substitution.

Gauss elimination requires O(n3) operations.1 Thus this algorithm is suitable for a system with thousands of equations, but not for a system with millions of equations. When the system is very large, often times many of the matrix elements are zero—a sparse linear system. Iterative methods, which are introduced in Section 13.6, are a better approach in this case.

12.3 Homogeneous Linear Systems

Let’s revisit the topic of Section 5.8, homogeneous linear systems, which take the form

$A u = 0 .$ $A u = 0 .$

The trivial solution is always an option, but of little interest. How do we use Gauss elimination to find a nontrivial solution if it exists? Once we have one nontrivial solution, all multiples cu are solutions as well. The answer: slightly modify the back substitution step. An example will make this clear.

Example 12.4

Start with the homogeneous system

$[\begin{matrix} 1 & 2 & 3 \\ 1 & 2 & 3 \\ 1 & 2 & 3 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}] .$ $[\begin{matrix} 1 & 2 & 3 \\ 1 & 2 & 3 \\ 1 & 2 & 3 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}] .$

The matrix clearly has rank one. First perform forward elimination, arriving at

$[\begin{matrix} 1 & 2 & 3 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}] .$ $[\begin{matrix} 1 & 2 & 3 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}] .$

For each zero row of the transformed system, set the corresponding ui, the free variables, to one: u3 = 1 and u2 = 1. Back substituting these into the first equation gives u1 = −5 for the pivot variable. Thus a final solution is

$u = [\begin{matrix} - 5 \\ 1 \\ 1 \end{matrix}],$ $u = [\begin{matrix} - 5 \\ 1 \\ 1 \end{matrix}],$

and all vectors cu are solutions as well.

Since the 3 × 3 matrix is rank one, it has a two dimensional null space. The number of free variables is equal to the dimension of the null space. We can systematically construct two vectors u1, u2 that span the null space by setting one of the free variables to one and the other to zero, resulting in

$\begin{matrix} u_{1} = [\begin{matrix} - 3 \\ 0 \\ 1 \end{matrix}] & and & u_{2} = [\begin{matrix} - 2 \\ 1 \\ 0 \end{matrix}] . \end{matrix}$ $\begin{matrix} u_{1} = [\begin{matrix} - 3 \\ 0 \\ 1 \end{matrix}] & and & u_{2} = [\begin{matrix} - 2 \\ 1 \\ 0 \end{matrix}] . \end{matrix}$

All linear combinations of elements of the null space are also in the null space, for example, u = 1u1 + 1u2.

Column pivoting might be required to ready the matrix for back substitution.

Example 12.5

The homogeneous system

$[\begin{matrix} 0 & 6 & 3 \\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}] .$ $[\begin{matrix} 0 & 6 & 3 \\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}] .$

comes from an eigenvector problem similar to those in Section 7.3. (More on eigenvectors in higher dimensions in Chapter 15.)

The linear system in its existing form leads us to 0u3 = 0 and 2u3 = 0. To remedy this, we proceed with column exchanges:

$[\begin{matrix} 6 & 3 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 0 \end{matrix}] [\begin{matrix} u_{2} \\ u_{3} \\ u_{1} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}],$ $[\begin{matrix} 6 & 3 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 0 \end{matrix}] [\begin{matrix} u_{2} \\ u_{3} \\ u_{1} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}],$

where column 1 was exchanged with column 2 and then column 2 was exchanged with column 3. Each exchange requires that the associated unknowns are exchanged as well. Set the free variable: u1 = 1, then back substitution results in u3 = 0 and u2 = 0. All vectors

$c = [\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}]$ $c = [\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}]$

satisfy the original homogeneous system.

12.4 Inverse Matrices

The inverse of a square matrix A is the matrix that “undoes” A’s action, i.e., the combined action of A and A−1 the identity:

$\begin{matrix} A A^{- 1} = I . & (12.6) \end{matrix}$ $\begin{matrix} A A^{- 1} = I . & (12.6) \end{matrix}$

We introduced the essentials of inverse matrices in Section 5.9 and reviewed properties of the inverse in Section 9.11. In this section, we introduce the inverse for n × n matrices and discuss inverses in the context of solving linear systems, Gauss elimination, and LU decomposition (covered in more detail in Section 12.5).

Example 12.6

The following scheme shows a matrix A multiplied by its inverse A−1. The matrix A is on the left, A−1 is on top, and the result of the multiplication, the identity, is on the lower right:

How do we find the inverse of a matrix? In much the same way as we did in the 2 × 2 case in Section 5.9, we write

$\begin{matrix} A [\begin{matrix} \bar{a_{1}} & ... & \bar{a_{n}} \end{matrix}] = [\begin{matrix} e_{1} & ... & e_{n} \end{matrix}] . & (12.7) \end{matrix}$ $\begin{matrix} A [\begin{matrix} \bar{a_{1}} & ... & \bar{a_{n}} \end{matrix}] = [\begin{matrix} e_{1} & ... & e_{n} \end{matrix}] . & (12.7) \end{matrix}$

Here, the matrices are n × n, and the vectors $\bar{a_{i}}$ $\bar{a_{i}}$ as well as the ei are vectors with n components. The vector ei has all zero entries except for its ith component; it equals 1.

We may now interpret (12.7) as n linear systems:

$\begin{matrix} A \bar{a_{1}} = e_{1}, & ..., & A \bar{a_{n}} = e_{n} . & (12.8) \end{matrix}$ $\begin{matrix} A \bar{a_{1}} = e_{1}, & ..., & A \bar{a_{n}} = e_{n} . & (12.8) \end{matrix}$

In Example 5.8, we applied shears and a scaling to transform A into the identity matrix, and at the same time, the right-hand side (e1 and e2) was transformed into A−1. Now, with Gauss elimination as per Section 12.2, we apply forward elimination to A and to each of the ei. Then with back substitution, we solve for each of the $\bar{a_{i}}$ $\bar{a_{i}}$ that form A−1. However, as we will learn in Section 12.5, a more economical solution is found with LU decomposition. This method is tailored to solving multiple systems of equations that share the same matrix A, but have different right-hand sides.

Inverse matrices are primarily a theoretical concept. They suggest to solve a linear system Av = b by computing A−1 and then to set v = A−1b. Don’t do that! It is a very expensive way to solve a linear system; simple Gauss elimination or LU decomposition is much cheaper. (Explicitly forming the inverse requires forward elimination, n back substitution algorithms, and then a matrix-vector multiplication. On the other hand, Gauss elimination requires forward elimination and just one back substitution algorithm.)

The inverse of a matrix A exists only if the matrix is square (n × n) and the action of A does not reduce dimensionality, as in a projection. This means that all columns of A must be linearly independent. There is a simple way to see if a matrix A is invertible; just perform Gauss elimination for the first of the linear systems in (12.8). If you are able to transform A to upper triangular with all nonzero diagonal elements, then A is invertible. Otherwise, it is said to be singular. The term “nonzero” is to be taken with a grain of salt: real numbers are (almost) never zero, and tolerances must be employed.

Again we encounter the concept of matrix rank. An invertible matrix is said to have rank n or full rank. If a matrix reduces dimensionality by k, then it has rank n − k. The n × n identity matrix has rank n; the zero matrix has rank 0. An example of a matrix that does not have full rank is a projection. Review the structure of a 2D orthogonal projection in Section 4.8 and a 3D projection in Section 9.7 to confirm this statement about the rank.

Example 12.7

We apply forward elimination to three 4 × 4 matrices to achieve row echelon form:

$\begin{matrix} M_{1} = [\begin{array}{r} 1 & 3 & - 3 & 0 \\ 0 & 3 & 3 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}], \\ M_{2} = [\begin{array}{r} 1 & 3 & - 3 & 0 \\ 0 & 3 & 3 & 1 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 0 \end{array}], \\ M_{3} = [\begin{array}{r} 1 & 3 & - 3 & 0 \\ 0 & 3 & 3 & 1 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 2 \end{array}], \end{matrix}$ $\begin{matrix} M_{1} = [\begin{array}{r} 1 & 3 & - 3 & 0 \\ 0 & 3 & 3 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}], \\ M_{2} = [\begin{array}{r} 1 & 3 & - 3 & 0 \\ 0 & 3 & 3 & 1 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 0 \end{array}], \\ M_{3} = [\begin{array}{r} 1 & 3 & - 3 & 0 \\ 0 & 3 & 3 & 1 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 2 \end{array}], \end{matrix}$

M1 has rank 2, M2 has rank 3, and M3 has rank 4 or full rank.

Example 12.8

Let us compute the inverse of the n × n matrix Gj as defined in (12.3):

$G_{j}^{- 1} = [\begin{matrix} 1 \\ ⋱ \\ 1 \\ 1 \\ g_{j + 1, j} & 1 \\ ⋮ & ⋱ \\ g_{n, j} & 1 \end{matrix}] .$ $G_{j}^{- 1} = [\begin{matrix} 1 \\ ⋱ \\ 1 \\ 1 \\ g_{j + 1, j} & 1 \\ ⋮ & ⋱ \\ g_{n, j} & 1 \end{matrix}] .$

That’s simple! To make some geometric sense of this, you should realize that Gj is a shear, and so is $G_{j}^{- 1}$ $G_{j}^{- 1}$ , and it “undoes” Gj.

Here is another interesting property of the inverse of a matrix. Suppose k ≠ 0 and kA is an invertible matrix, then

${(k A)}^{- 1} = \frac{1}{k} A^{- 1} .$ ${(k A)}^{- 1} = \frac{1}{k} A^{- 1} .$

And yet another: If two matrices, A and B, are invertible, then the product AB is invertible, too.

12.5 LU Decomposition

Gauss elimination has two major parts: transforming the system to upper triangular form with forward elimination and back substitution. The creation of the upper triangular matrix may be written in terms of matrix multiplications using Gauss matrices Gj. For now, assume that no pivoting is necessary. If we denote the final upper triangular matrix by U, then we have

$\begin{matrix} G_{n - 1} \cdot ... \cdot G_{1} \cdot A = U . & (12.9) \end{matrix}$ $\begin{matrix} G_{n - 1} \cdot ... \cdot G_{1} \cdot A = U . & (12.9) \end{matrix}$

It follows that

$A = G_{1}^{- 1} \cdot ... \cdot G_{n - 1}^{- 1} U .$ $A = G_{1}^{- 1} \cdot ... \cdot G_{n - 1}^{- 1} U .$

The neat thing about the product $G_{1}^{- 1} \cdot ... \cdot G_{n - 1}^{- 1}$ $G_{1}^{- 1} \cdot ... \cdot G_{n - 1}^{- 1}$ is that it is a lower triangular matrix with elements gi,j below the diagonal and zeroes

above the diagonal:

$G_{1}^{- 1} \cdot ... \cdot G_{n - 1}^{- 1} = [\begin{matrix} 1 \\ g_{2, 1} & 1 \\ ⋮ & ⋱ & ⋱ \\ g_{n, 1} & ... & g_{n, n - 1} & 1 \end{matrix}] .$ $G_{1}^{- 1} \cdot ... \cdot G_{n - 1}^{- 1} = [\begin{matrix} 1 \\ g_{2, 1} & 1 \\ ⋮ & ⋱ & ⋱ \\ g_{n, 1} & ... & g_{n, n - 1} & 1 \end{matrix}] .$

We denote this product by L (for lower triangular). Thus,

$\begin{matrix} A = L U, & (12.10) \end{matrix}$ $\begin{matrix} A = L U, & (12.10) \end{matrix}$

which is known as the LU decomposition of A. It is also called the triangular factorization of A. Every invertible matrix A has such a decomposition, although it may be necessary to employ pivoting.

Denote the elements of L by li,j (keeping in mind that li,j = 1) and those of U by ui,j. A simple 3 × 3 example will help illustrate the idea.

In this scheme, we are given the ai,j and we want the li,j and ui,j. This is systematically achieved using the following.

Observe that elements of A below the diagonal may be rewritten as

$\begin{matrix} a_{i, j} = l_{i, 1} u_{1, j} + ... + l_{i, j - 1} u_{j - 1, j} + l_{i, j} u_{j, j}; & j < i, \end{matrix}$ $\begin{matrix} a_{i, j} = l_{i, 1} u_{1, j} + ... + l_{i, j - 1} u_{j - 1, j} + l_{i, j} u_{j, j}; & j < i, \end{matrix}$

For the elements of A that are on or above the diagonal, we get

$\begin{matrix} a_{i, j} = l_{i, 1} u_{1, j} + ... + l_{i, i - 1} u_{i - 1, j} + l_{i, i} u_{i, j}; & j \geq i . \end{matrix}$ $\begin{matrix} a_{i, j} = l_{i, 1} u_{1, j} + ... + l_{i, i - 1} u_{i - 1, j} + l_{i, i} u_{i, j}; & j \geq i . \end{matrix}$

This leads to

$\begin{matrix} l_{i, j} = \frac{1}{u_{j, j}} (a_{i, j} - l_{i, 1} u_{1, j} - ... - l_{i, j - 1} u_{j - 1, j}); & j < i & (12.11) \end{matrix}$ $\begin{matrix} l_{i, j} = \frac{1}{u_{j, j}} (a_{i, j} - l_{i, 1} u_{1, j} - ... - l_{i, j - 1} u_{j - 1, j}); & j < i & (12.11) \end{matrix}$

and

$\begin{matrix} u_{i, j} = a_{i, j} - l_{i, 1} u_{1, j} - ... - l_{i, i - 1} u_{i - 1. j}; & j \geq i . & (12.12) \end{matrix}$ $\begin{matrix} u_{i, j} = a_{i, j} - l_{i, 1} u_{1, j} - ... - l_{i, i - 1} u_{i - 1. j}; & j \geq i . & (12.12) \end{matrix}$

If A has a decomposition A = LU, then the system can be written as

$\begin{matrix} L U u = b . & (12.13) \end{matrix}$ $\begin{matrix} L U u = b . & (12.13) \end{matrix}$

The matrix vector product Uu results in a vector; call this y. Reexamining (12.13), it becomes a two-step problem. First solve

$\begin{matrix} L y = b, & (12.14) \end{matrix}$ $\begin{matrix} L y = b, & (12.14) \end{matrix}$

then solve

$\begin{matrix} U u = y . & (12.15) \end{matrix}$ $\begin{matrix} U u = y . & (12.15) \end{matrix}$

If Uu = y, then LUu = Ly = b. The two systems in (12.14) and (12.15) are triangular and easy to solve. Forward substitution is applied to the matrix L. (See Exercise 21 and its solution for an algorithm.) Back substitution is applied to the matrix U. An algorithm is provided in Section 12.2.

A more direct method for forming L and U is achieved with (12.11) and (12.12), rather than through Gauss elimination. This then is the method of LU decomposition.

LU Decomposition

Given: A coefficient matrix A and a right-hand side b describing a linear system

$A u = b .$ $A u = b .$

Find: The unknowns u1,..., un.

Algorithm:

Initialize L as the identity matrix and U as the zero matrix.
Calculate the nonzero elements of L and U:
For k = 1, ..., n
- $u_{k, k} = a_{k, k} - l_{k, 1} u_{1, k} - ... - l_{k, k - 1} u_{k - 1, k}$ $u_{k, k} = a_{k, k} - l_{k, 1} u_{1, k} - ... - l_{k, k - 1} u_{k - 1, k}$
- For i = k + 1,..., n
  - $l_{i, k} = \frac{1}{u_{k, k}} [a_{i, k} - l_{i, 1} u_{1, k} - ... - l_{i, k - 1} u_{k - 1, k}]$ $l_{i, k} = \frac{1}{u_{k, k}} [a_{i, k} - l_{i, 1} u_{1, k} - ... - l_{i, k - 1} u_{k - 1, k}]$
- For j = k + 1 ,..., n
  - $u_{k, j} = a_{k, j} - l_{k, 1} u_{1, j} - ... - l_{k, k - 1} u_{k - 1, j}$ $u_{k, j} = a_{k, j} - l_{k, 1} u_{1, j} - ... - l_{k, k - 1} u_{k - 1, j}$
Using forward substitution solve Ly = b.
Using back substitution solve Uu = y.

The uk,k term must not be zero; we had a similar situation with Gauss elimination. This situation either requires pivoting or the matrix might be singular.

The construction of the LU decomposition takes advantage of the triangular structure of L and U combined with a particular computation order. The matrix L is being filled column by column and the matrix U is being filled row by row.

Example 12.9

Let’s use LU decomposition to solve the linear system

$A = [\begin{array}{r} 2 & 2 & 4 \\ - 1 & 2 & - 3 \\ 1 & 2 & 2 \end{array}] u = [\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}] .$ $A = [\begin{array}{r} 2 & 2 & 4 \\ - 1 & 2 & - 3 \\ 1 & 2 & 2 \end{array}] u = [\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}] .$

The first step is to decompose A. Following the steps in the algorithm above, we calculate the matrix entries:

$\begin{array}{l} k = 1 : & u_{1, 1} = a_{1, 1} = 2 \\ l_{2, 1} = \frac{a_{2, 1}}{u_{1, 1}} = \frac{- 1}{2} \\ l_{3, 1} = \frac{a_{3, 1}}{u_{1, 1}} = \frac{1}{2}, \\ u_{1, 2} = a_{1, 2} = 2, \\ u_{1, 3} = a_{1, 3} = 4, \end{array}$ $\begin{array}{l} k = 1 : & u_{1, 1} = a_{1, 1} = 2 \\ l_{2, 1} = \frac{a_{2, 1}}{u_{1, 1}} = \frac{- 1}{2} \\ l_{3, 1} = \frac{a_{3, 1}}{u_{1, 1}} = \frac{1}{2}, \\ u_{1, 2} = a_{1, 2} = 2, \\ u_{1, 3} = a_{1, 3} = 4, \end{array}$

$\begin{array}{l} k = 2 : & u_{2, 2} = a_{2, 2} - l_{2, 1} u_{1, 2} = 2 + 1 = 3, \\ l_{3, 2} = \frac{1}{u_{2, 2}} [a_{3, 2} - l_{3, 1} u_{1, 2}] = \frac{1}{3} [2 - 1] = \frac{1}{3}, \\ u_{2, 3} = a_{2, 3} - l_{2, 1} u_{1, 3} = - 3 + 2 = - 1, \end{array}$ $\begin{array}{l} k = 2 : & u_{2, 2} = a_{2, 2} - l_{2, 1} u_{1, 2} = 2 + 1 = 3, \\ l_{3, 2} = \frac{1}{u_{2, 2}} [a_{3, 2} - l_{3, 1} u_{1, 2}] = \frac{1}{3} [2 - 1] = \frac{1}{3}, \\ u_{2, 3} = a_{2, 3} - l_{2, 1} u_{1, 3} = - 3 + 2 = - 1, \end{array}$

$\begin{matrix} k = 3 : & u_{3, 3} = a_{3, 3} - l_{3, 1} u_{1, 3} - l_{3, 2} u_{2, 3} = 2 - 2 + \frac{1}{3} = \frac{1}{3} . \end{matrix}$ $\begin{matrix} k = 3 : & u_{3, 3} = a_{3, 3} - l_{3, 1} u_{1, 3} - l_{3, 2} u_{2, 3} = 2 - 2 + \frac{1}{3} = \frac{1}{3} . \end{matrix}$

Check that this produces valid entries for L and U:

Next, we solve Ly = b with forward substitution—solving for y1, then y2, and then y3—and find that

$y = [\begin{matrix} 1 \\ 3 / 2 \\ 0 \end{matrix}] .$ $y = [\begin{matrix} 1 \\ 3 / 2 \\ 0 \end{matrix}] .$

The last step is to solve Uu = y with back substitution—as we did in Gauss elimination,

$u = [\begin{matrix} 0 \\ 1 / 2 \\ 0 \end{matrix}] .$ $u = [\begin{matrix} 0 \\ 1 / 2 \\ 0 \end{matrix}] .$

It is simple to check that this solution is correct since clearly, the column vector a2 is a multiple of b, and that is reflected in u.

Suppose A is nonsingular, but in need of pivoting. Then a permutation matrix P is used to exchange (possibly multiple) rows so it is possible to create the LU decomposition. The system is now PAu = Pb and we find PA = LU.

Finally, the major benefit of the LU decomposition: speed. For cases in which we have to solve multiple linear systems with the same coefficient matrix, LU decomposition is a big timesaver. We perform it once, and then perform the forward and backward substitutions (12.14) and (12.15) for each right-hand side. This is significantly less work than performing a complete Gauss elimination every time! Finding the inverse of a matrix, as described in (12.8), is an example of a problem that requires solving multiple linear systems with the same coefficient matrix.

12.6 Determinants

With the introduction of the scalar triple product, Section 8.5 provided a geometric derivation of 3 × 3 determinants; they measure volume. And then in Section 9.8 we learned more about determinants from the perspective of linear maps. Let’s revisit that approach for n × n determinants.

When we apply forward elimination to A, transforming it to upper triangular form U, we apply a sequence of shears and row exchanges. Shears do not change the volumes. As we learned in Section 9.8, a row exchange will change the sign of the determinant. Thus the column vectors of U span the same volume as did those of A, however the sign might change. This volume is given by the signed product of the diagonal entries of U and is called the determinant of A:

$\begin{matrix} \det A = {(- 1)}^{k} (u_{1, 1} \times ... \times u_{n, n}), & (12.16) \end{matrix}$ $\begin{matrix} \det A = {(- 1)}^{k} (u_{1, 1} \times ... \times u_{n, n}), & (12.16) \end{matrix}$

where k is the number of row exchanges. In general, this is the best (and most stable) method for finding the determinant (but also see the method in Section 16.4).

Example 12.10

Let’s revisit Example 12.3 to illustrate how to calculate the determinant with the upper triangular form, and how row exchanges influence the sign of the determinant.

Use the technique of cofactor expansion, as defined by (9.14) to find the determinant of the given 3 × 3 matrix A:

$\det A = 2 | \begin{matrix} 1 & 2 \\ 1 & 1 \end{matrix} | - 2 | \begin{matrix} 1 & 2 \\ 2 & 1 \end{matrix} | = 4.$ $\det A = 2 | \begin{matrix} 1 & 2 \\ 1 & 1 \end{matrix} | - 2 | \begin{matrix} 1 & 2 \\ 2 & 1 \end{matrix} | = 4.$

Now, apply (12.16) to the upper triangular form U (12.5) from the example, and notice that we did one row exchange, k = 1:

$\det A = {(- 1)}^{1} [2 \times - 1 \times 2] = 4.$ $\det A = {(- 1)}^{1} [2 \times - 1 \times 2] = 4.$

So the shears of Gauss elimination have not changed the absolute value of the determinant, and by modifying the sign based on the number of row exchanges, we can determine det A from det U.

The technique of cofactor expansion that was used for the 3 × 3 matrix in Example 12.10 may be generalized to n × n matrices. Choose any column or row of the matrix, for example entries a1,j as above, and then

$\det A = a_{1, 1} C_{1, 1} + a_{1, 2} C_{1, 2} + ... + a_{1, n} C_{1, n},$ $\det A = a_{1, 1} C_{1, 1} + a_{1, 2} C_{1, 2} + ... + a_{1, n} C_{1, n},$

where each cofactor is defined as

$C_{i, j} = {(- 1)}^{i + j} M_{i, j},$ $C_{i, j} = {(- 1)}^{i + j} M_{i, j},$

and the Mi,j are called the minors; each is the determinant of the matrix with the ith row and jth column removed. The Mi,j are (n − 1) × (n − 1) determinants, and they are computed by yet another cofactor expansion. This process is repeated until we have 2 × 2 determinants. This technique is also known as expansion by minors.

Example 12.11

Let’s look at repeated application of cofactor expansion to find the determinant. Suppose we are given the following matrix,

$A = [\begin{array}{r} 2 & 2 & 0 & 4 \\ 0 & - 1 & 1 & 3 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 5 \end{array}] .$ $A = [\begin{array}{r} 2 & 2 & 0 & 4 \\ 0 & - 1 & 1 & 3 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 5 \end{array}] .$

We may choose any row or column from which to form the cofactors, so in this example, we will have less work to do if we choose the first column. The cofactor expansion is

$\det A = 2 | \begin{array}{r} - 1 & 1 & 3 \\ 0 & 2 & 0 \\ 0 & 0 & 5 \end{array} | = 2 (- 1) | \begin{matrix} 2 & 0 \\ 0 & 5 \end{matrix} | = 2 (- 1) (10) = - 20.$ $\det A = 2 | \begin{array}{r} - 1 & 1 & 3 \\ 0 & 2 & 0 \\ 0 & 0 & 5 \end{array} | = 2 (- 1) | \begin{matrix} 2 & 0 \\ 0 & 5 \end{matrix} | = 2 (- 1) (10) = - 20.$

Since the matrix is in upper triangular form, we could use (12.16) and immediately see that this is in fact the correct determinant.

Cofactor expansion is more a theoretical tool than a computational one. This method of calculating the determinant plays an important theoretical role in the analysis of linear systems, and there are advanced theorems involving cofactor expansion and the inverse of a matrix. Computationally, Gauss elimination and the calculation of det U is superior.

In our first encounter with solving linear systems via Cramer’s rule in Section 5.3, we learned that the solution to a linear system may be found by simply forming quotients of areas. Now with our knowledge of n × n determinants, let’s revisit Cramer’s rule. If Au = b is an n × n linear system such that det A ≠ 0, then the system has the following unique solution:

$\begin{matrix} u_{1} = \frac{\det A_{1}}{\det A}, & u_{2} = \frac{\det A_{2}}{\det A}, & ..., & u_{n} = \frac{\det A_{n}}{\det A}, & (12.17) \end{matrix}$ $\begin{matrix} u_{1} = \frac{\det A_{1}}{\det A}, & u_{2} = \frac{\det A_{2}}{\det A}, & ..., & u_{n} = \frac{\det A_{n}}{\det A}, & (12.17) \end{matrix}$

where Ai is the matrix obtained by replacing the entries in the ith column by b. Cramer’s rule is an important theoretical tool; however, use it only for 2 × 2 or 3 × 3 linear systems.

Example 12.12

Let’s solve the linear system from Example 12.3 using Cramer’s rule. Following (12.17), we have

$\begin{matrix} u_{1} = \frac{| \begin{matrix} 6 & 2 & 0 \\ 9 & 1 & 2 \\ 7 & 1 & 1 \end{matrix} |}{| \begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix} |}, & u_{2} = \frac{| \begin{matrix} 2 & 6 & 0 \\ 1 & 9 & 2 \\ 2 & 7 & 1 \end{matrix} |}{| \begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix} |}, & u_{3} = \frac{| \begin{matrix} 2 & 2 & 6 \\ 1 & 1 & 9 \\ 2 & 1 & 7 \end{matrix} |}{| \begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix} |} . \end{matrix}$ $\begin{matrix} u_{1} = \frac{| \begin{matrix} 6 & 2 & 0 \\ 9 & 1 & 2 \\ 7 & 1 & 1 \end{matrix} |}{| \begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix} |}, & u_{2} = \frac{| \begin{matrix} 2 & 6 & 0 \\ 1 & 9 & 2 \\ 2 & 7 & 1 \end{matrix} |}{| \begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix} |}, & u_{3} = \frac{| \begin{matrix} 2 & 2 & 6 \\ 1 & 1 & 9 \\ 2 & 1 & 7 \end{matrix} |}{| \begin{matrix} 2 & 2 & 0 \\ 1 & 1 & 2 \\ 2 & 1 & 1 \end{matrix} |} . \end{matrix}$

We have computed the determinant of the coefficient matrix A in Example 12.10, det A = 4. With the application of cofactor expansion for each numerator, we find that

$\begin{matrix} u_{1} = \frac{4}{4} = 1, & u_{2} = \frac{8}{2} = 2, & u_{3} = \frac{12}{4} = 3, \end{matrix}$ $\begin{matrix} u_{1} = \frac{4}{4} = 1, & u_{2} = \frac{8}{2} = 2, & u_{3} = \frac{12}{4} = 3, \end{matrix}$

which is identical to the solution found with Gauss elimination.

The determinant of a positive definite matrix is always positive, and therefore the matrix is always nonsingular. The upper-left submatrices of an n × n matrix A are

$\begin{matrix} A_{1} = [a_{1, 1}], & A_{2} = [\begin{matrix} a_{1, 1} & a_{1, 2} \\ a_{2, 1} & a_{2, 2} \end{matrix}], & ..., & A_{n} = A . \end{matrix}$ $\begin{matrix} A_{1} = [a_{1, 1}], & A_{2} = [\begin{matrix} a_{1, 1} & a_{1, 2} \\ a_{2, 1} & a_{2, 2} \end{matrix}], & ..., & A_{n} = A . \end{matrix}$

If A is positive definite, then the determinants of all Ai are positive. Rules for working with determinants are given in Section 9.8.

12.7 Least Squares

When presented with large amounts of data, we often look for methods to create a simpler view or synopsis of the data. For example, Figure 12.3 is a graph of AIG’s monthly average stock price over twelve years. We see a lot of activity in the price, but there is a clear declining trend. A mathematical tool to capture this, which works when the trend is not as clear as it is here, is linear least squares approximation. The line illustrated in Figure 12.3 is the “best fit” line or best approximating line.

Figure 12.3

Figure showing least squares: fitting a straight line to stock price data for AIG from 2000 to 2013.

Least squares: fitting a straight line to stock price data for AIG from 2000 to 2013.

Linear least squares approximation is also useful when analyzing experimental data, which can be “noisy,” either from the data capture or observation method or from round-off from computations that generated the data. We might want to make summary statements about the data, estimate values where data is missing, or predict future values.

As a concrete (simple) example, suppose our experimental data are temperature (Celsius) over time (seconds):

$\begin{matrix} [\begin{matrix} time \\ temperature \end{matrix}] & [\begin{matrix} 0 \\ 30 \end{matrix}] & [\begin{matrix} 10 \\ 25 \end{matrix}] & [\begin{matrix} 20 \\ 40 \end{matrix}] & [\begin{matrix} 30 \\ 40 \end{matrix}] & [\begin{matrix} 40 \\ 30 \end{matrix}] & [\begin{matrix} 50 \\ 5 \end{matrix}] & [\begin{matrix} 60 \\ 25 \end{matrix}], \end{matrix}$ $\begin{matrix} [\begin{matrix} time \\ temperature \end{matrix}] & [\begin{matrix} 0 \\ 30 \end{matrix}] & [\begin{matrix} 10 \\ 25 \end{matrix}] & [\begin{matrix} 20 \\ 40 \end{matrix}] & [\begin{matrix} 30 \\ 40 \end{matrix}] & [\begin{matrix} 40 \\ 30 \end{matrix}] & [\begin{matrix} 50 \\ 5 \end{matrix}] & [\begin{matrix} 60 \\ 25 \end{matrix}], \end{matrix}$

which are plotted in Figure 12.4. We want to establish a simple linear relationship between the variables,

$temperature = a \times time + b,$ $temperature = a \times time + b,$

Figure 12.4

Figure showing least squares: a linear approximation to experimental data of time and temperature pairs.

Least squares: a linear approximation to experimental data of time and temperature pairs.

Writing down all relationships between knowns and unknowns, we obtain linear equations of the form

$\begin{matrix} [\begin{matrix} \begin{matrix} 0 & 1 \end{matrix} \\ \begin{matrix} 10 & 1 \end{matrix} \\ \begin{matrix} 20 & 1 \end{matrix} \\ \begin{matrix} 30 & 1 \end{matrix} \\ \begin{matrix} 40 & 1 \end{matrix} \\ \begin{matrix} 50 & 1 \end{matrix} \\ \begin{matrix} 60 & 1 \end{matrix} \end{matrix}] [\begin{matrix} a \\ b \end{matrix}] = [\begin{matrix} 30 \\ 25 \\ 40 \\ 40 \\ 30 \\ 5 \\ 25 \end{matrix}] . & (12.18) \end{matrix}$ $\begin{matrix} [\begin{matrix} \begin{matrix} 0 & 1 \end{matrix} \\ \begin{matrix} 10 & 1 \end{matrix} \\ \begin{matrix} 20 & 1 \end{matrix} \\ \begin{matrix} 30 & 1 \end{matrix} \\ \begin{matrix} 40 & 1 \end{matrix} \\ \begin{matrix} 50 & 1 \end{matrix} \\ \begin{matrix} 60 & 1 \end{matrix} \end{matrix}] [\begin{matrix} a \\ b \end{matrix}] = [\begin{matrix} 30 \\ 25 \\ 40 \\ 40 \\ 30 \\ 5 \\ 25 \end{matrix}] . & (12.18) \end{matrix}$

We write the system as

$\begin{matrix} A u = b, & where & u = [\begin{matrix} a \\ b \end{matrix}] \end{matrix} .$ $\begin{matrix} A u = b, & where & u = [\begin{matrix} a \\ b \end{matrix}] \end{matrix} .$

This system of seven equations in two unknowns is overdetermined and in general it will not have solutions; it is inconsistent. After all, it is not very likely that b lives in the subspace $V$ $V$ formed by the columns of A. (As an analogy, consider the likelihood of a randomly selected 3D vector living in the [e1, e2]-plane.) But there is a recipe for finding an approximate solution.

Denoting by b′ a vector in $V$ $V$ , the system

$\begin{matrix} A u = b^{'} & (12.19) \end{matrix}$ $\begin{matrix} A u = b^{'} & (12.19) \end{matrix}$

is solvable (consistent), but it is still overdetermined since we have seven equations in two unknowns. Recall from Section 2.8 that we can write b as the sum of its orthogonal projection into $V$ $V$ and the component of b orthogonal to $V$ $V$ ,

$\begin{matrix} b = b^{'} + b^{⊥} . & (12.20) \end{matrix}$ $\begin{matrix} b = b^{'} + b^{⊥} . & (12.20) \end{matrix}$

Also recall that b′ is closest to b and in $V$ $V$ . Sketch 12.3 illustrates this idea in 3D.

Sketch 12.3

Sketch showing least squares.

Least squares.

Since $b^{⊥}$ $b^{⊥}$ is orthogonal to $V$ $V$ , we can use matrix notation to formalize this relationship,

$\begin{matrix} a_{1}^{T} b^{⊥} = 0 & and & a_{2}^{T} b^{⊥} = 0, \end{matrix}$ $\begin{matrix} a_{1}^{T} b^{⊥} = 0 & and & a_{2}^{T} b^{⊥} = 0, \end{matrix}$

which is equivalent to

$A^{T} b^{⊥} = 0 .$ $A^{T} b^{⊥} = 0 .$

Based on (12.20), we can substitute b − b′ for $b^{⊥}$ $b^{⊥}$ ,

$\begin{matrix} A^{T} (b - b^{'}) & = 0 \\ A^{T} (b - A u) & = 0 \\ A^{T} b - A^{T} A u & = 0 . \end{matrix}$ $\begin{matrix} A^{T} (b - b^{'}) & = 0 \\ A^{T} (b - A u) & = 0 \\ A^{T} b - A^{T} A u & = 0 . \end{matrix}$

Rearranging this last equation, we have the normal equations

$\begin{matrix} A^{T} A u = A^{T} b . & (12.21) \end{matrix}$ $\begin{matrix} A^{T} A u = A^{T} b . & (12.21) \end{matrix}$

This is a linear system with a square matrix ATA! Even more, that matrix is symmetric. The solution to the new system (12.21), when it has one, is the one that minimizes the error

${|| A u - b ||}^{2},$ ${|| A u - b ||}^{2},$

and for this reason, it is called the least squares solution of the original system. Recall that b′ is closest to b in $V$ $V$ and since we solved (12.19), we have in effect minimized ||b′ − b||.

It seems pretty amazing that by simply multiplying both sides by AT, we have a “best” solution to the original problem!

Example 12.13

Returning to the system in (12.18), we form the normal equations,

$[\begin{matrix} 9100 & 210 \\ 210 & 7 \end{matrix}] [\begin{matrix} a \\ b \end{matrix}] = [\begin{matrix} 5200 \\ 195 \end{matrix}] .$ $[\begin{matrix} 9100 & 210 \\ 210 & 7 \end{matrix}] [\begin{matrix} a \\ b \end{matrix}] = [\begin{matrix} 5200 \\ 195 \end{matrix}] .$

(Notice that the matrix is symmetric.)

The least squares solution is the solution of this linear system,

$[\begin{matrix} a \\ b \end{matrix}] = [\begin{matrix} - 0.23 \\ 34.8 \end{matrix}],$ $[\begin{matrix} a \\ b \end{matrix}] = [\begin{matrix} - 0.23 \\ 34.8 \end{matrix}],$

which corresponds to the line x2 = −0.23x1 + 34.8. Figure 12.4 illustrates this line with negative slope and e2 intercept of 34.8.

Imagine a scenario where our data capture method failed due to some environmental condition. We might want to remove data points if they seem outside the norm. These are called outliers. Point six in Figure 12.4 looks to be an outlier. The least squares approximating line provides a means for determining that this point is something of an exception.

Linear least squares approximation can also serve as a method for data compression.

Numerical problems can creep into the normal equations of the linear system (12.21). This is particularly so when the n × m matrix A has many more equations than unknowns, n ≫ m. In Section 13.1, we will examine the Householder method for finding the least squares solution to the linear system Au = b directly, without forming the normal equations. Example 12.13 will be revisited in Example 13.3. And yet another look at the least squares solution is possible with the singular value decomposition of A in Section 16.6.

12.8 Application: Fitting Data to a Femoral Head

In prosthetic surgery, a common task is that of hip bone replacement. This involves removing an existing femoral head and replacing it by a transplant, consisting of a new head and a shaft for attaching it into the existing femur. The transplant is typically made from titanium or ceramic; the part that is critical for perfect fit and thus function is the spherical head as shown in Figure 12.5. Data points are collected from the existing femoral head by means of MRI or PET scans, then a spherical fit is obtained, and finally the transplant is manufactured. The fitting process is explained next.

Figure 12.5

Figure showing femur transplant: left, a titanium femoral head with shaft. Right, an example of a sphere fit. Black points are “in front,” gray points are occluded.

Femur transplant: left, a titanium femoral head with shaft. Right, an example of a sphere fit. Black points are “in front,” gray points are occluded.

We are given a set of 3D vectors v1,..., vL that are of approximately equal length, ρ1,...,ρL. We would like to find a sphere (centered at the origin) with radius r that closely fits the vi.

If all vi were on that sphere, we would have

$\begin{matrix} r = ρ_{1} & (12.22) \end{matrix}$ $\begin{matrix} r = ρ_{1} & (12.22) \end{matrix}$

$\begin{matrix} \begin{matrix} ⋮ \end{matrix} & (12.23) \end{matrix}$ $\begin{matrix} \begin{matrix} ⋮ \end{matrix} & (12.23) \end{matrix}$

$\begin{matrix} r = ρ_{L} . & (12.24) \end{matrix}$ $\begin{matrix} r = ρ_{L} . & (12.24) \end{matrix}$

This is a very overdetermined linear system—L equations in only one unknown, r!

In matrix form:

$[\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix}] [r] = [\begin{matrix} ρ_{1} \\ ⋮ \\ ρ_{L} \end{matrix}] .$ $[\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix}] [r] = [\begin{matrix} ρ_{1} \\ ⋮ \\ ρ_{L} \end{matrix}] .$

Be sure to verify that the matrix dimensions work out!

Multiplying both sides by [1 ... 1] gives

$L r = ρ_{1} + ... + ρ_{L}$ $L r = ρ_{1} + ... + ρ_{L}$

with the final result

$r = \frac{ρ_{1} + ... + ρ_{L}}{L} .$ $r = \frac{ρ_{1} + ... + ρ_{L}}{L} .$

Thus the least squares solution is simply the average of the given radii—just as our intuition would have suggested in the first place. Things are not that simple if it comes to more unknowns, see Section 16.6.

n × n linear system
coefficient matrix
consistent system
subspace
solvable system
unsolvable system
Gauss elimination
upper triangular matrix
forward elimination
back substitution
elementary row operation
permutation matrix
row echelon form
pivoting
Gauss matrix
multiplier
augmented matrix
singular matrix
matrix rank
full rank
rank deficient
homogeneous linear system
inverse matrix
LU decomposition
factorization
forward substitution
lower triangular matrix
determinant
cofactor expansion
expansion by minors
Cramer’s rule
overdetermined system
least squares solution
normal equations

12.9 Exercises

Does the linear system

$[\begin{matrix} 1 & 2 & 0 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{matrix}] u = [\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}]$ $[\begin{matrix} 1 & 2 & 0 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{matrix}] u = [\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}]$

have a unique solution? Is it consistent?
Does the linear system

$[\begin{matrix} 1 & 1 & 5 \\ 1 & - 1 & 1 \\ 1 & 2 & 7 \end{matrix}] u = [\begin{matrix} 3 \\ 3 \\ 3 \end{matrix}]$ $[\begin{matrix} 1 & 1 & 5 \\ 1 & - 1 & 1 \\ 1 & 2 & 7 \end{matrix}] u = [\begin{matrix} 3 \\ 3 \\ 3 \end{matrix}]$

have a unique solution? Is it consistent?
Examine the linear system in Example 12.1. What restriction on the ti is required to guarantee a unique solution?
Solve the linear system Av = b where

$\begin{matrix} A = [\begin{array}{r} 1 & 0 & - 1 & 2 \\ 0 & 0 & 1 & - 2 \\ 2 & 0 & 0 & 1 \\ 1 & 1 & 1 & 0 \end{array}], & and & b = [\begin{array}{r} - 1 \\ 2 \\ 1 \\ - 3 \end{array}] . \end{matrix}$ $\begin{matrix} A = [\begin{array}{r} 1 & 0 & - 1 & 2 \\ 0 & 0 & 1 & - 2 \\ 2 & 0 & 0 & 1 \\ 1 & 1 & 1 & 0 \end{array}], & and & b = [\begin{array}{r} - 1 \\ 2 \\ 1 \\ - 3 \end{array}] . \end{matrix}$

Show all the steps from the Gauss elimination algorithm.
Solve the linear system Av = b where

$\begin{matrix} A = [\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 1 & 1 & 1 \end{matrix}], & and & b = [\begin{array}{r} - 1 \\ 0 \\ - 1 \end{array}] . \end{matrix}$ $\begin{matrix} A = [\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 1 & 1 & 1 \end{matrix}], & and & b = [\begin{array}{r} - 1 \\ 0 \\ - 1 \end{array}] . \end{matrix}$

Show all the steps from the Gauss elimination algorithm.
Solve the linear system Av = b where

$\begin{matrix} A = [\begin{matrix} 4 & 2 & 1 \\ 2 & 2 & 0 \\ 4 & 2 & 3 \end{matrix}], & and & b = [\begin{matrix} 7 \\ 2 \\ 9 \end{matrix}] . \end{matrix}$ $\begin{matrix} A = [\begin{matrix} 4 & 2 & 1 \\ 2 & 2 & 0 \\ 4 & 2 & 3 \end{matrix}], & and & b = [\begin{matrix} 7 \\ 2 \\ 9 \end{matrix}] . \end{matrix}$
Transform the following linear system to row echelon form.

$[\begin{matrix} 3 & 2 & 0 \\ 3 & 1 & 2 \\ 0 & 2 & 0 \end{matrix}] u = [\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}] .$ $[\begin{matrix} 3 & 2 & 0 \\ 3 & 1 & 2 \\ 0 & 2 & 0 \end{matrix}] u = [\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}] .$
What is the rank of the following matrix?

$[\begin{matrix} 3 & 2 & 0 & 1 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}]$ $[\begin{matrix} 3 & 2 & 0 & 1 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}]$
What is the permutation matrix that will exchange rows 3 and 4 in a 5 × 5 matrix?
What is the permutation matrix that will exchange rows 2 and 4 in a 4 × 4 matrix?
What is the matrix G as defined in (12.4) for Example 12.2?
What is the matrix G as defined in (12.4) for Exercise 6?
Solve the linear system

$[\begin{matrix} 4 & 1 & 2 \\ 2 & 1 & 1 \\ 2 & 1 & 1 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}]$ $[\begin{matrix} 4 & 1 & 2 \\ 2 & 1 & 1 \\ 2 & 1 & 1 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}]$

with Gauss elimination with pivoting.
Solve the linear system

$[\begin{matrix} 3 & 6 & 1 \\ 6 & 12 & 2 \\ 9 & 18 & 3 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}]$ $[\begin{matrix} 3 & 6 & 1 \\ 6 & 12 & 2 \\ 9 & 18 & 3 \end{matrix}] u = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}]$

with Gauss elimination with pivoting.
Find the inverse of the matrix from Exercise 5.
Find the inverse of

$[\begin{matrix} 3 & 2 & 1 \\ 0 & 2 & 1 \\ 0 & 2 & 1 \end{matrix}] .$ $[\begin{matrix} 3 & 2 & 1 \\ 0 & 2 & 1 \\ 0 & 2 & 1 \end{matrix}] .$
Find the inverse of

$[\begin{matrix} \cos θ & 0 & - \sin θ \\ 0 & 1 & 0 \\ \sin θ & 0 & \cos θ \end{matrix}] .$ $[\begin{matrix} \cos θ & 0 & - \sin θ \\ 0 & 1 & 0 \\ \sin θ & 0 & \cos θ \end{matrix}] .$
Find the inverse of

$[\begin{array}{r} 5 & 0 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 & 0 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{array}] .$ $[\begin{array}{r} 5 & 0 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 & 0 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{array}] .$
Find the inverse of

$[\begin{array}{r} 3 & 0 \\ 2 & 1 \\ 1 & 4 \\ 3 & 2 \end{array}] .$ $[\begin{array}{r} 3 & 0 \\ 2 & 1 \\ 1 & 4 \\ 3 & 2 \end{array}] .$
Calculate the LU decomposition of the matrix

$A = [\begin{matrix} 3 & 0 & 1 \\ 1 & 2 & 0 \\ 1 & 1 & 1 \end{matrix}] .$ $A = [\begin{matrix} 3 & 0 & 1 \\ 1 & 2 & 0 \\ 1 & 1 & 1 \end{matrix}] .$
Write a forward substitution algorithm for solving the lower triangular system (12.14).
Use the LU decomposition of A from Exercise 20 to solve the linear system Au = b, where

$b = [\begin{matrix} 4 \\ 0 \\ 4 \end{matrix}] .$ $b = [\begin{matrix} 4 \\ 0 \\ 4 \end{matrix}] .$
Calculate the determinant of

$A = [\begin{matrix} 3 & 0 & 1 \\ 1 & 2 & 0 \\ 1 & 1 & 1 \end{matrix}] .$ $A = [\begin{matrix} 3 & 0 & 1 \\ 1 & 2 & 0 \\ 1 & 1 & 1 \end{matrix}] .$
What is the rank of the matrix in Exercise 23?
Calculate the determinant of the matrix

$A = [\begin{matrix} 2 & 4 & 3 & 6 \\ 1 & 0 & 0 & 0 \\ 2 & 1 & 0 & 1 \\ 1 & 1 & 2 & 0 \end{matrix}]$ $A = [\begin{matrix} 2 & 4 & 3 & 6 \\ 1 & 0 & 0 & 0 \\ 2 & 1 & 0 & 1 \\ 1 & 1 & 2 & 0 \end{matrix}]$

using expansion by minors. Show all steps.
Apply Cramer’s rule to solve the following linear system:

$[\begin{matrix} 3 & 0 & 1 \\ 1 & 2 & 0 \\ 1 & 1 & 1 \end{matrix}] u = [\begin{matrix} 8 \\ 6 \\ 6 \end{matrix}] .$ $[\begin{matrix} 3 & 0 & 1 \\ 1 & 2 & 0 \\ 1 & 1 & 1 \end{matrix}] u = [\begin{matrix} 8 \\ 6 \\ 6 \end{matrix}] .$

Hint: Reuse your work from Exercise 23.
Apply Cramer’s rule to solve the following linear system:

$[\begin{matrix} 3 & 0 & 1 \\ 0 & 2 & 0 \\ 0 & 2 & 1 \end{matrix}] u = [\begin{matrix} 6 \\ 4 \\ 7 \end{matrix}] .$ $[\begin{matrix} 3 & 0 & 1 \\ 0 & 2 & 0 \\ 0 & 2 & 1 \end{matrix}] u = [\begin{matrix} 6 \\ 4 \\ 7 \end{matrix}] .$
Set up and solve the linear system for solving the intersection of the three planes,

$\begin{matrix} x_{1} + x_{3} = 1, & x_{3} = 1, & x_{2} = 2. \end{matrix}$ $\begin{matrix} x_{1} + x_{3} = 1, & x_{3} = 1, & x_{2} = 2. \end{matrix}$
Find the intersection of the plane

$x (u_{1}, u_{2}) = [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}] + u_{1} [\begin{matrix} 1 \\ 0 \\ - 1 \end{matrix}] u_{2} [\begin{matrix} 0 \\ 1 \\ - 1 \end{matrix}]$ $x (u_{1}, u_{2}) = [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}] + u_{1} [\begin{matrix} 1 \\ 0 \\ - 1 \end{matrix}] u_{2} [\begin{matrix} 0 \\ 1 \\ - 1 \end{matrix}]$

and the line

$p (t) = [\begin{matrix} 1 \\ 1 \\ 2 \end{matrix}] + t [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}]$ $p (t) = [\begin{matrix} 1 \\ 1 \\ 2 \end{matrix}] + t [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}]$

by setting up the problem as a linear system and solving it.
Let five points be given by

$\begin{array}{r} p_{1} = [\begin{array}{r} - 2 \\ 2 \end{array}], & p_{2} = [\begin{array}{r} - 1 \\ 1 \end{array}], & p_{3} = [\begin{matrix} 0 \\ 0 \end{matrix}], & p_{4} = [\begin{matrix} 1 \\ 1 \end{matrix}], & p_{5} = [\begin{matrix} 2 \\ 2 \end{matrix}] . \end{array}$ $\begin{array}{r} p_{1} = [\begin{array}{r} - 2 \\ 2 \end{array}], & p_{2} = [\begin{array}{r} - 1 \\ 1 \end{array}], & p_{3} = [\begin{matrix} 0 \\ 0 \end{matrix}], & p_{4} = [\begin{matrix} 1 \\ 1 \end{matrix}], & p_{5} = [\begin{matrix} 2 \\ 2 \end{matrix}] . \end{array}$

Find the linear least squares approximation.
Let five points be given by

$\begin{array}{r} p_{1} = [\begin{matrix} 0 \\ 0 \end{matrix}], & p_{2} = [\begin{matrix} 1 \\ 1 \end{matrix}], & p_{3} = [\begin{matrix} 2 \\ 2 \end{matrix}], & p_{4} = [\begin{array}{r} 1 \\ - 1 \end{array}], & p_{5} = [\begin{matrix} 2 \\ - 2 \end{matrix}] . \end{array}$ $\begin{array}{r} p_{1} = [\begin{matrix} 0 \\ 0 \end{matrix}], & p_{2} = [\begin{matrix} 1 \\ 1 \end{matrix}], & p_{3} = [\begin{matrix} 2 \\ 2 \end{matrix}], & p_{4} = [\begin{array}{r} 1 \\ - 1 \end{array}], & p_{5} = [\begin{matrix} 2 \\ - 2 \end{matrix}] . \end{array}$

Find the linear least squares approximation.
Let two points be given by

$\begin{matrix} p_{1} = [\begin{array}{r} - 4 \\ 1 \end{array}], & p_{2} = [\begin{matrix} 4 \\ 3 \end{matrix}] . \end{matrix}$ $\begin{matrix} p_{1} = [\begin{array}{r} - 4 \\ 1 \end{array}], & p_{2} = [\begin{matrix} 4 \\ 3 \end{matrix}] . \end{matrix}$

Find the linear least squares approximation.

1Read this loosely as: an estimated number of n3 operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12: Gauss for Linear Systems

Create new playlist

Sign In

Sign Up