Linear Equations and Curve FittingLinear algebra has important applications to the common scientific problem of representing empirical data by means of equations or functions of specified types. We give here only a brief introduction to this extensive subject.
Typically, we begin with a collection of given data points that are to be represented by a specific type of function For instance, y might be the volume of a sample of gas when its temperature is x. Thus the given data points are the results of experiment or measurement, and we want to determine the curve in the xy-plane so that it passes through each of these points; see Figure 3.7.1. Thus we speak of “fitting” the curve to the data points.
We will confine our attention largely to polynomial curves. A polynomial of degree n is a function of the form
where the coefficients are constants. The data point lies on the curve provided that The condition that this be so for each yields the equations
Because the numbers and are given, this is a system of linear equations in the unknowns (the coefficients that determine the polynomial in (1)).
The coefficient matrix of the system in (1) is the Vandermonde matrix
whose determinant is discussed in Problems 61–63 of Section 3.6. It follows from Eq. (25) there that, if the x-coordinates are distinct, then the matrix A is nonsingular. Hence Theorem 7 in Section 3.5 implies that the system in (2) has a unique solution for the coefficients in (1). Thus there is a unique nth degree polynomial that fits the given data points. We call it an interpolating polynomial, and say that it interpolates the given points.
Find a cubic polynomial of the form
that interpolates the data points and (3, 16).
In a particular problem, it generally is simpler to use distinct capital letters rather than subscripted symbols to denote the coefficients. Here we want to find the values of A, B, C, and D so that and These conditions yield the four linear equations
We readily reduce this system to the echelon form
and then back substitution yields and Thus the desired cubic polynomial is
The graph of this cubic is shown in Fig. 3.7.2, along with the four original data points.
As a concrete example of interpolation, we consider the growth of the world’s human population. The table in Fig. 3.7.3 shows the total world population (in billions) at 5-year intervals. Actual populations are shown for the years 1975–2010. The figures listed for the years 2015–2040 are the world populations that were predicted by the United Nations on the basis of detailed demographic analysis of population trends during the 20th century on a country-by-country basis throughout the world. Each entry of the final column of this table gives the average annual percentage growth rate during the preceding 5-year period. For instance, for the growth during the 5-year period 1975–1980, so the average annual growth during this period is about 1.8%.
Year | World Population (billions) | Percent Growth |
---|---|---|
1975 | 4.062 | |
1980 | 4.440 | 1.80% |
1985 | 4.853 | 1.79% |
1990 | 5.310 | 1.82% |
1995 | 5.735 | 1.55% |
2000 | 6.127 | 1.33% |
2005 | 6.520 | 1.25% |
2010 | 6.930 | 1.23% |
2015 | 7.349 | 1.18% |
2020 | 7.758 | 1.09% |
2025 | 8.142 | 0.97% |
2030 | 8.501 | 0.87% |
2035 | 8.839 | 0.78% |
2040 | 9.157 | 0.71% |
We see that the world population grew at an annual rate of about 1.8% during the 1980s, but the rate of growth has slowed since then, and it is expected to slow even more during the coming decades of the 21st century. In particular, the growth of the world population at the present time in history is not natural or exponential in character—that characterization would imply a constant percentage rate of growth. We explore the possibility of interpolating world population data with polynomial models that might be usable to predict future populations. It seems natural to expect better results with higher-degree interpolating polynomials. Let’s see whether this is so.
First, we fit a linear polynomial (with in 1900) to the 1995 and 2005 world population values. We need only solve the equations
for Thus our linear interpolating polynomial is
Now let’s fit a quadratic polynomial (with in 1900) to the 1995, 2000, and 2005 world population values. With the three data points and the system in (2) yields the equations
having the calculator solution (Fig. 3.7.4). Thus our quadratic interpolating polynomial is
Next we fit a cubic polynomial (with in 1900) to the 1995, 2000, 2005, and 2010 world population values. With the four data points (95, 5.735), (100, 6.127), (105, 6.520), and (110, 6.930), the system in (2) yields the four equations
As in the 3.5 Application, a calculator or computer yields the solution
Thus our cubic interpolating polynomial is
In order to fit a fourth-degree population model of the form
to the 1990-1995-2000-2005-2010 world population data, we need to solve the linear system
to find the values of the coefficients a, b, c, d, and e. The result is
The table in Fig. 3.7.5 compares our linear, quadratic, cubic, and quartic predictions with the “correct” United Nations prediction for the year 2030. Each “error” in the third column of this table is the amount by which the corresponding prediction undershoots (positive error) or overshoots (negative error) the U.N. prediction. We see that the quadratic prediction is better than the linear but also markedly better than the cubic prediction. The quartic prediction is an improvement over the cubic, yet still not as good as the quadratic. Thus there is at best an uncertain relationship between the degree of the polynomial model and the accuracy of its predictions.
Year 2030 Prediction | Error | |
---|---|---|
Linear | 8.482 | |
Quadratic | 8.500 | |
Cubic | 9.060 | |
Quartic | 8.430 | |
United Nations | 8.501 |
Figure 3.7.6 shows the U.N. world population data points for the years 1975 through 2040, together with the plots of the quadratic, cubic, and quartic population functions of Examples 3, 4, and 5. (The plot of the linear population function of Example 2 is virtually indistinguishable from that of the quadratic function for the values of t shown in the figure.) It looks as though the more work we do to find a polynomial fitting selected data points, the less we get for our effort. It is certainly true in this figure that—outside the interval from 1990 to 2010—the higher the degree of the polynomial, the worse it appears to fit the given data points. The issue here is the difference between
interpolating data points within the interval of given points being fitted, and
extrapolating data points outside this interval.
All four of our polynomials appear to do a good job of interpolating but, somewhat paradoxically, the higher the degree, the worse the apparent accuracy of extrapolation. The highly questionable accuracy of data extrapolation outside the interval of interpolation has significant implications. For instance, consider a news report that when a certain alleged carcinogen was fed to mice in sufficient amounts to kill an elephant, the mice developed cancer. It is then argued that moderate amounts of this carcinogen may cause cancer in humans; or that if 1 part per billion of this carcinogen in the environment kills 1 person, then 1 part per million (a thousand times as much) will kill 1000 people. Such arguments are common, but they may well be cases of extrapolation beyond the range of accuracy. The bottom line is that interpolation is fairly safe—though hardly fail-safe—but extrapolation is risky.
In contrast with population prediction, there are interesting situations where curve fitting is exact. For instance, the fact that “two points determine a line” in the plane means that, when we fit the linear function to a given pair of points, we get precisely the one and only straight line in the plane that passes through these points. Similarly, “three points determine a circle,” meaning that there is one and only one circle in the plane that passes through three given noncollinear points. In order to find this particular circle, we recall that the equation of a circle with center (h, k) and radius r is
Simplification gives
that is,
(where and ) as the general equation of a circle in the plane.
Find the equation of the circle that is determined by the points and R(6, 4).
Substitution of the xy-coordinates of each of the three points P, Q, and R into (9) gives the three equations
Reduction of the corresponding augmented coefficient matrix to reduced row-echelon form (Fig. 3.7.8) yields and Thus the equation of the desired circle is
To find its center and radius, we complete the squares in x and y and get
Thus the circle has center (2, 1) and radius 5 (Fig. 3.7.9).
Three appropriate points in the plane also determine a central conic with equation of the form
This is a rotated conic section—an ellipse, parabola, or hyperbola—centered at the origin of the xy-coordinate system. Figure 3.7.10 shows a typical rotated ellipse in the plane.
Find the equation of the central conic that passes through the same three points and R(6, 4) of Example 6.
Substitution of the xy-coordinates of each of the three points P, Q, and R into (10) gives the linear system of three equations
in the three unknowns A, B, and C. Reduction of the corresponding augmented coefficient matrix to reduced row-echelon form (Fig. 3.7.11) yields the values
If we substitute these coefficient values in (10) and multiply the result by the common denominator 14212, we get the desired equation
of our central conic. The computer plot in Fig. 3.7.12 verifies that this rotated ellipse does indeed pass through all three points P, Q, and R.
In each of Problems 1–10, data points are given. Find the nth degree polynomial that fits these points.
(1, 1) and (3, 7)
and
(0, 3), (1, 1), and
and (2, 16)
(1, 3), (2, 3), and (3, 5)
and (5, 5)
and
and (2, 3)
and (2, 26)
and
Three points are given in each of Problems 11–14. Find the equation of the circle determined by these points, as well as its center and radius.
and (7, 5)
and
and
(0, 0), (10, 0), and
In Problems 15–18, find an equation of the central ellipse that passes through the three given points.
(0, 5), (5, 0), and (5, 5)
(0, 5), (5, 0), and (10, 10)
(0, 1), (1, 0), and (10, 10)
(0, 4), (3, 0), and (5, 5)
Find a curve of the form that passes through the points (1, 5) and (2, 4).
Find a curve of the form that passes through the points (1, 2), (2, 20), and (4, 41).
A sphere in space with center (h, k, l) and radius r has equation
Four given points in space suffice to determine the values of h, k, l, and r. In Problems 21 and 22, find the center and radius of the sphere that passes through the four given points P, Q, R, and S. Hint: Substitute each given triple of coordinates into the sphere equation above to obtain four equations that h, k, l, and r must satisfy. To solve these equations, first subtract the first one from each of the other three. How many unknowns are left in the three equations that result?
Problems 23–34 are intended as calculator or computer problems and are based on the U.S. census data in the table of Fig. 3.7.13, listed by national region in millions for the census years 1950–1990. See www.census.gov/population/censusdata/table-16.pdf
for further details.
In Problems 23–26, fit a quadratic function to the 1970, 1980, and 1990 population values for the indicated region.
The Northeast
The Midwest
The South
The West
27–30. The same as Problems 23–26, except fit a cubic polynomial to the 1960, 1970, 1980, and 1990 population data for the indicated region.
31–34. The same as Problems 23–26, except fit a quartic polynomial to the 1950, 1960, 1970, 1980, and 1990 population data for the indicated region.
Problems 35 through 40 illustrate the use of determinants in fitting polynomial curves to data points.
1950 | 1960 | 1970 | 1980 | 1990 | |
---|---|---|---|---|---|
Northeast | 39.478 | 44.678 | 49.061 | 49.137 | 50.809 |
Midwest | 44.461 | 51.619 | 56.590 | 58.867 | 59.669 |
South | 47.197 | 54.973 | 62.813 | 75.367 | 85.446 |
West | 20.190 | 28.053 | 34.838 | 43.171 | 52.786 |
U.S. | 151.326 | 179.323 | 203.302 | 226.542 | 248.710 |
Explain why the determinant equation
fits a quadratic polynomial of the form to the three given points and .
Expand the determinant in Problem 35 to find a parabola that interpolates the points (1, 3), (2, 3), and (3, 7).
Explain why the determinant equation
fits a circle of the form to the three given points and .
Expand the determinant in Problem 37 to find the equation of a circle passing through the three points and Then find its center and radius.
Explain why the determinant equation
fits a central conic equation of the form to the three given points and .
Expand the determinant in Problem 39 to find the equation of the ellipse passing through the three points (0, 4), (3, 0), and (5, 5).
18.119.132.123