Just about anybody can solve two equations in two unknowns by somehow manipulating the equations. In this chapter, we will develop a systematic way for finding the solution, simply by checking the underlying geometry. This approach will later enable us to solve much larger systems of equations in Chapters 12 and 13. Figure 5.1 illustrates many instances of intersecting two lines: a problem that can be formulated as a 2 × 2 linear system.
In our standard [e1, e2]-coordinate system, suppose we are given two vectors a1 and a2. In Section 4.1, we showed how these vectors define a skew target box with its lower-left corner at the origin. As illustrated in Sketch 5.1, suppose we also are given a vector b with respect to the [e1, e2]-system. Now the question arises, what are the components of b with respect to the [a1, a2]-system? In other words, we want to find a vector u with components u1 and u2, satisfying
Before we proceed further, let’s look at an example. Following Sketch 5.1, let
Upon examining the sketch, we see that
In the [a1, a2]-system, b has components (1, 1/2). In the [e1, e2]- system, it has components (4,4).
What we have here are really two equations in the two unknowns u1 and u2 , which we see by expanding the vector equations into
And as we saw in Example 5.1, these two equations in two unknowns have the solution u1 = 1 and u2 = 1/2, as is seen by inserting these values for u1 and u2 into the equations.
Being able to solve two simultaneous sets of equations allows us to switch back and forth between different coordinate systems. The rest of this chapter is dedicated to a detailed discussion of how to solve these equations.
The two equations in (5.2) are also called a linear system. It can be written more compactly if we use matrix notation:
In general, a 2 × 2 linear system looks like this:
Equation (5.4) is shorthand notation for the equations
We sometimes write it even shorter, using a matrix A:
where
Both u and b represent vectors, not points! (See Sketch 5.1 for an illustration.) The vector u is called the solution of the linear system.
While the savings of this notation is not completely obvious in the 2 × 2 case, it will save a lot of work for more complicated cases with more equations and unknowns.
The columns of the matrix A correspond to the vectors a1 and a2. We could then rewrite our linear system as (5.1). Geometrically, we are trying to express the given vector b as a linear combination of the given vectors a1 and a2; we need to determine the factors u1 and u2. If we are able to find at least one solution, then the linear system is called consistent, otherwise it is called inconsistent. Three possibilities for our solution space exist.
Sketch 5.2 offers a direct solution to our linear system. By inspecting the areas of the parallelograms in the sketch, we see that
An easy way to see how these ratios of areas correspond to u1 and u2 is to shear the parallelograms formed by b, a2 and b, a1 onto the a1 and a2 axes, respectively. (Shears preserve areas.) The area of a parallelogram is given by the determinant of the two vectors spanning it. Recall from Section 4.9 that this is a signed area. This method of solving for the solution of a linear system is called Cramer’s rule.
Applying Cramer’s rule to the linear system in (5.3), we get
Examining the determinant in the numerator, notice that b replaces a1 in the solution for u1 and then b replaces a2 in the solution for u2.
Notice that if the area spanned by a1 and a2 is zero, that is, the vectors are multiples of each other, then Cramer’s rule will not result in a solution. (See Section 5.6, Section 5.7 and Section 5.8 for more information on this situation.)
Cramer’s rule is primarily of theoretical importance. For larger systems, Cramer’s rule is both expensive and numerically unstable. Hence, we now study a more effective method.
Let’s consider a special 2 × 2 linear system:
This situation is shown in Sketch 5.3. This matrix is called upper triangular because all elements below the diagonal are zero, forming a triangle of numbers above the diagonal.
We can solve this system without much work. Examining Equation (5.8), we see it is possible to solve for
With u2 in hand, we can solve the first equation from (5.5) for
This technique of solving the equations from the bottom up is called back substitution.
Notice that the process of back substitution requires divisions. Therefore, if the diagonal elements, a1,l or a2,2, equal zero then the algorithm will fail. This type of failure indicates that the columns of A are not linearly independent. (See Section 5.6, Section 5.7 and Section 5.8 for more information on this situation.) Because of the central role that the diagonal elements play in Gauss elimination, they are called pivots.
In general, we will not be so lucky to encounter an upper triangular system as in (5.8). But any linear system in which A is nonsingular may be transformed to this simple form, as we shall see by reexamining the system in (5.3). We write it as
This situation is shown in Sketch 5.4. Clearly, a1 is not on the e1-axis as we would like, but we can apply a stepwise procedure so that it will become just that. This systematic, stepwise procedure is called forward elimination. The process of forward elimination followed by back substitution is called Gauss elimination.1
Recall one key fact from Chapter 4: linear maps do not change linear combinations. That means if we apply the same linear map to all vectors in our system, then the factors u1 and u2 won’t change. If the map is given by a matrix S, then
In order to get a1 to line up with the e1-axis, we will employ a shear parallel to the e2-axis, such that
That shear (see Section 4.7) is given by the matrix
We apply S1 to all vectors involved in our system:
The effect of this map is shown in Sketch 5.5.
Our transformed system now reads
Now we can employ back substitution to find
For 2 × 2 linear systems there is only one matrix entry to zero in the forward elimination procedure. We will restate the procedure in a more algorithmic way in Chapter 12 when there is more work to do.
We will look at one more example of forward elimination and back substitution. Let a linear system be given by
The shear that takes a1 to the e1-axis is given by
and it transforms the system to
Draw your own sketch to understand the geometry.
Using back substitution, the solution is now easily found as u1 = 8/10 and u2 = 2/10.
Consider the system
illustrated in Sketch 5.6.
Our standard approach, shearing a1 onto the e1-axis, will not work here; there is no shear that takes
onto the e1-axis. However, there is no problem if we simply exchange the two equations! Then we have
and thus u1 = u2 = 1. So we cannot blindly apply a shear to a1; we must first check that one exists. If it does not—i.e., if a1,1 = 0—exchange the equations.
As a rule of thumb, if a method fails because some number equals zero, then it will work poorly if that number is small. It is thus advisable to exchange the two equations anytime we have |a1,1| < |a2,1|. The absolute value is used here since we are interested in the magnitude of the involved numbers, not their sign. The process of exchanging equations (rows) so that the pivot is the largest in absolute value is called row pivoting or partial pivoting, and it is used to improve numerical stability. In Section 5.8, a special type of linear system is introduced that sometimes needs another type pivoting. However, since row pivoting is the most common, we’ll refer to this simply as “pivoting.”
Let’s study an example taken from [11]:
If we shear a1 onto the e1-axis, thus applying one forward elimination step, the new system reads
Performing back substitution, we find the solution is
which we will call the “true” solution. Note the magnitude of changes in a2 and b relative to a1. This is the type of behavior that causes numerical problems. It can often be dealt with by using a larger number of digits.
Suppose we have a machine that stores only three digits, although it calculates with six digits. Due to round-off, the system above would be stored as
which would result in a “round-off” solution of
which is not very close to the true solution ut, as
Pivoting is a tool to damper the effects of round-off. Now employ pivoting by exchanging the rows, yielding the system
Shear a1 onto the e1-axis, and the new system reads
which results in the “pivoting solution”
Notice that the vectors of the linear systems are all within the same range. Even with the three-digit machine, this system will allow us to compute a result that is closer to the true solution because the effects of round-off have been minimized. Now the error is
Numerical strategies are the primary topic of numerical analysis, but they cannot be ignored in the study of linear algebra. Because this is an important real-world topic, we will revisit it. In Section 12.2 we will present Gauss elimination with pivoting integrated into the algorithm. In Section 13.4 we will introduce the condition number of a matrix, which is a measure for closeness to being singular. Chapters 13 and 16 will introduce other methods for solving linear systems that are better to use when numerical issues are a concern.
Consider the situation shown in Sketch 5.7. The two vectors a1 and a2 are multiples of each other. In other words, they are linearly dependent.
The corresponding linear system is
It is obvious from the sketch that we have a problem here, but let’s just blindly apply forward elimination; apply a shear such that a1 is mapped to the e1-axis. The resulting system is
But the last equation reads 0 = −1, and now we really are in trouble! This means that our system is inconsistent, and therefore does not have a solution.
It is possible however, to find an approximate solution. This is done in the context of least squares methods, see Section 12.7.
Consider the system
shown in Sketch 5.8.
We shear a1 onto the e1-axis, and obtain
Now the last equation reads 0 = 0—true, but a bit trivial! In reality, our system is just one equation written down twice in slightly different forms. This is also clear from the sketch: b may be written as a multiple of either a1 or a2, thus the system is underdetermined. This type of system is consistent because at least one solution exists. We can find a solution by setting u2 = 1, and then back substitution results in u1 = 1.
A system of the form
i.e., one where the right-hand side consists of the zero vector, is called homogeneous. One obvious solution is the zero vector itself; this is called the trivial solution and is usually of little interest. If it has a solution u that is not the zero vector, then clearly all multiples cu are also solutions: we multiply both sides of the equations by a common factor c. In other words, the system has an infinite number of solutions.
Not all homogeneous systems do have a nontrivial solution, however. Equation (5.9) may be read as follows: What vector u, when mapped by A, has the zero vector as its image? The only 2 × 2 maps capable of achieving this have rank 1. They are characterized by the fact that their two columns a1 and a2 are parallel, or linearly dependent. If the system has only the trivial solution, then A is invertible.
An example, illustrated in Sketch 5.9, should help. Let our homogeneous system be
Clearly, a2 = 2a1; the matrix A maps all vectors onto the line defined by a1 and the origin. In this example, any vector u that is perpendicular to a1 will be projected to the zero vector:
After one step of forward elimination, we have
Any u2 solves the last equation. So let’s pick u2 = 1. Back substitution then gives u1 = −2, therefore
is a solution to the system; so is any multiple of it. Also check that a1 · u = 0, so they are in fact perpendicular.
All vectors u that satisfy a homogeneous system make up the kernel or null space of the matrix.
We now consider an example of a homogeneous system that has only the trivial solution:
The two columns of A are linearly independent; therefore, A does not reduce dimensionality. Then it cannot map any nonzero vector u to the zero vector!
This is clear after one step of forward elimination,
and back substitution results in
In general, we may state that a homogeneous system has nontrivial solutions (and therefore, infinitely many solutions) only if the columns of the matrix are linearly dependent.
In some situations, row pivoting will not prepare the linear system for back substitution, necessitating column pivoting. When columns are exchanged, the corresponding unknowns must be exchanged as well.
This next linear system might seem like a silly one to pose; however, systems of this type do arise in Section 7.3 in the context of finding eigenvectors:
In order to apply back substitution to this system, column pivoting is necessary, thus the system becomes
Now we set u1 = 1 and proceed with back substitution to find that u2 = 0. All vectors of the form
are solutions.
In this section, we will see how to undo a linear map. Reconsider the linear system
The matrix A maps u to b. Now that we know u, what is the matrix B that maps b back to u,
Defining B—the inverse map—is the purpose of this section.
In solving the original linear system, we applied shears to the column vectors of A and to b. After the first shear, we had
This demonstrated how shears can be used to zero elements of the matrix. Let’s return to the example linear system in (5.3). After applying S1 the system became
Let’s use another shear to zero the upper right element. Geometrically, this corresponds to constructing a shear that will map the new a2 to the e2-axis. It is given by the matrix
Applying it to all vectors gives the new system
After the second shear, our linear system has been changed to
Next, apply a nonuniform scaling S3 in the e1 and e2 directions that will map the latest a1 and a2 onto the vectors e1 and e2. For our current example,
The new system becomes
which corresponds to
This is a very special system. First of all, to solve for u is now trivial because A has been transformed into the unit matrix or identity matrix I,
This process of transforming A until it becomes the identity is theoretically equivalent to the back substitution process of Section 5.4. However, back substitution uses fewer operations and thus is the method of choice for solving linear systems.
Yet we have now found the matrix B in (5.10)! The two shears and scaling transformed A into the identity matrix I:
thus, the solution of the system is
This leads to the inverse matrix A−1 of a matrix A:
The matrix A−1 undoes the effect of the matrix A: the vector u was mapped to b by A, and b is mapped back to u by A−1. Thus, we can now write (5.13) as
If this transformation result can be achieved, then A is called invertible. At the end of this section and in Sections 5.6 and 5.7, we discuss cases in which A−1 does not exist.
If we combine (5.12) and (5.14), we immediately get
This makes intuitive sense, since the actions of a map and its inverse should cancel out, i.e., not change anything—that is what I does! Figures 5.2 and 5.3 illustrate this. Then by the definition of the inverse,
If A−1 exits, then it is unique.
The inverse of the identity is the identity
The inverse of a scaling is given by:
Multiply this out to convince yourself.
Figure 5.2 shows the effects of a matrix and its inverse for the scaling
Figure 5.3 shows the effects of a matrix and its inverse for the shear
We consider the inverse of a rotation as follows: if Rα rotates by α degrees counterclockwise, then R−α rotates by α degrees clockwise, or
as we can see from the definition of a rotation matrix (4.16).
The rotation matrix is an example of an orthogonal matrix. An orthogonal matrix A is characterized by the fact that
The column vectors a1 and a2 of an orthogonal matrix satisfy ||a1|| = 1, ||a2|| = 1 and a1 · a2 = 0. In words, the column vectors are orthonormal. The row vectors are orthonormal as well. Those transformations that are described by orthogonal matrices are called rigid body motions. The determinant of an orthogonal matrix is ±1.
We add without proof two fairly obvious identities that involve the inverse:
which should be obvious fromFigures 5.2 and 5.3, and
Figure 5.4 illustrates this for
Given a matrix A, how do we compute its inverse? Let us start with
If we denote the two (unknown) columns of A−1 by and , and those of I by e1 and e2, then (5.18) may be written as
This is really short for two linear systems
Both systems have the same matrix A and can thus be solved simultaneously. All we have to do is to apply the familiar shears and scale—those that transform A to I—to both e1 and e2.
Let’s revisit Example 5.3 with
Our two simultaneous systems are:
The first shear takes this to
The second shear yields
Finally the scaling produces
Thus the inverse matrix
It can be the case that a matrix A does not have an inverse. For example, the matrix
is not invertible because the columns are linearly dependent (and therefore the determinant is zero). A noninvertible matrix is also referred to as singular. If we try to compute the inverse by setting up two simultaneous systems,
then the first shear produces
At this point it is clear that we will not be able to construct linear maps to achieve the identity matrix on the left side of the equation. Examples of singular matrices were introduced in Section 5.6, Section 5.7 and Section 5.8.
Matrices map vectors to vectors. If we know the result of such a map, namely that two vectors v1 and v2 were mapped to and , can we find the matrix that did it?
Suppose some matrix A was responsible for the map. We would then have the two equations
Combining them, we can write
or, even shorter,
To define A, we simply find V −1, then
Of course v1 and v2 must be linearly independent for V−1 to exist. If the vi and are each linearly independent, then A represents a change of basis.
Let’s find the linear map A that maps the basis V formed by vectors
and the basis V′ formed by vectors
as illustrated in Sketch 5.10.
First, we find V−1 following the steps in Example 5.8, resulting in
The change of basis linear map is
Check that vi is mapped to . If we have any vector v in the V basis, this map will return the coordinates of its corresponding vector v′ in V′. Sketch 5.10 illustrates that v = [0 1]T is mapped to v′ = [0 − 1]T.
In Section 6.5 we’ll revisit this topic with an application.
Let’s take a moment to recognize a dual view of linear systems. A coordinate system or linear combination approach (5.1) represents what we might call the “column view.” If instead we focus on the row equations (5.5–5.6) we take a “row view.” A great example of this can be found by revisiting two line intersection scenarios in Examples 3.9 and 3.10. In the former, we are intersecting two lines in parametric form, and the problem statement takes the column view by asking what linear combination of the column vectors results in the right- hand side. In the latter, we are intersecting two lines in implicit form, and the problem statement takes the row view by asking what u1 and u2 satisfy both line equations. Depending on the problem at hand, we can choose the view that best suits our given information.
We took a column view in our approach to presenting 2 × 2 linear systems, but equally valid would be a row view. Let’s look at the key examples from this chapter as if they were posed as implicit line intersection problems. Figure 5.5 illustrates linear systems from Example 5.1 (unique solution), the example in Section 5.6 (inconsistent linear system), and the example in Section 5.7 (underdetermined system). Importantly, the column and row views of the systems result in the same classification of the solution sets.
Figure 5.6 illustrates two types of homogeneous systems from examples in Section 5.8. Since the right-hand side of each line equation is zero, the lines will pass through the origin. This guarantees the trivial solution for both intersection problems. The system with nontrivial solutions is depicted on the right as two identical lines.
Using the matrix form, write down the linear system to express
in terms of the local coordinate system defined by the origin,
Is the following linear system consistent? Why?
Use Cramer’s rule to solve the system
Use Gauss elimination to solve the system
Use Gauss elimination with pivoting to solve the system
Does the following homogeneous system have a nontrivial solution?
What is the kernel of the matrix
What is the null space of the matrix
What is the inverse of the matrix
What is the inverse of the matrix
What is the inverse of
Define the matrix A that maps
Define the matrix A that maps
1 Gauss elimination and forward elimination are often used interchangeably.
18.224.62.105