Lesson 3
Least-squares Estimation: Batch Processing

Summary

The main purpose of this lesson is the derivation of the classical batch formula of (weighted) least squares. The term batch means that all measurements are collected together and processed simultaneously. A second purpose of this lesson is to demonstrate that least-squares estimates may change in numerical value under changes of scale. One way around this difficulty is to use normalized data.

Least-squares estimates require no assumptions about the nature of the generic linear model. Consequently, the formula for the least-squares estimator (LSE) is easy to derive. We will learn in Lesson 8, that the price paid for ease in derivation is difficulty in performance evaluation.

The supplementary material at the end of this lesson contrasts least squares, total least squares, and constrained total least squares. The latter techniques are frequently more powerful than least squares, especially in signal-processing applications.

When you complete this lesson you will be able to (1) derive and use the classical batch formulas of (weighted) least squares; these are theoretical formulas that should not be programmed as is for digital computation; numerically well behaved linear algebra programs are commercially available for computational purposes; (2) explain the sensitivity of (weighted) least squares to scale change; and (3) explain the difference between least squares, total least squares, and constrained total least squares.

Introduction

The method of least squares dates back to Karl Gauss around 1795 and is the cornerstone for most estimation theory, both classical and modern. It was invented by Gauss at a time when he was interested in predicting the motion of planets and comets using telescopic measurements. The motions of these bodies can be completely characterized by six parameters. The estimation problem that Gauss considered was one of inferring the values of these parameters from the measurement data.

We shall study least-squares estimation from two points of view: the classical batch-processing approach, in which all the measurements are processed together at one time, and the more modern recursive processing approach, in which measurements are processed only a few (or even one) at a time. The recursive approach has been motivated by today’s high-speed digital computers; however, as we shall see, the recursive algorithms are outgrowths of the batch algorithms.

The starting point for the method of least squares is the linear model

(3-1)

Image

where Z(k) = col (z(k), z(k – 1),…, z(kN + 1)), z(k) = h′ (k)θ + v(k), and the estimation model for Z(k) is

(3-2)

Image

We denote the (weighted) least-squares estimator of θ as [Image]Image. In this lesson and the next two we shall determine explicit structures for this estimator.

Gauss: A Short Biography

Carl Friederich Gauss was born on April 30, 1777, in Braunschweig, Germany. Although his father wanted him to go into a trade, his mother had the wisdom to recognize Gauss’s genius, which manifested itself at a very early age, and saw to it that he was properly schooled.

E.T. Bell (1937), in his famous essay on Gauss, refers to him as the “Prince of Mathematicians.” Courant (1969) states that “Gauss was one of the three most famous mathematicians in history, in company with Archimedes and Newton.”

Gauss invented the method of least squares at the age of 18. This work was the beginning of a lifelong interest in the theory of observation. He attended the University of Göttingen from October 1795 to September 1798. Some say that these three years were his most prolific. He received his doctor’s degree in absentia from the University of Helmstedt in 1799. His doctoral dissertation was the first proof of the fundamental theorem of algebra.

In 1801 he published his first masterpiece, Arithmetical Researches, which revolutionized all of arithmetic and established number theory as an organic branch of mathematics. In 1809 he published his second masterpiece, Theory of the Motion of Heavenly Bodies Revolving Round the Sun in Conic Sections, in which he predicted the orbit of Ceres. E.T. Bell felt that Gauss’s excursions into astronomical works was a waste of 20 years, during which time he could have been doing more pure mathematics. This is an interesting point of view (not shared by Courant); for what we take as very important to us in estimation theory, least squares and its applications, has been viewed by some mathematicians as a diversion.

In 1812 Gauss published another great work on the hypergeometric series from which developed many applications to differential equations in the nineteenth century. Gauss invented the electric telegraph (working with Wilhelm Weber) in 1833. He made major contributions in geodesy, the theories of surfaces, conformal mapping, mathematical physics (particularly electromagnetism, terrestrial magnetism, and the theory of attraction according to Newtonian law), analysis situs, and the geometry associated with functions of a complex variable.

Gauss was basically a loner. He published his results only when they were absolutely polished, which made his publications extremely difficult to understand, since so many of the details had been stripped away. He kept a diary throughout his lifetime in which he briefly recorded all of his “gems.” This diary did not become known until many years after his death. It established his precedence for results associated with the names of many other famous mathematicians (e.g., he is now credited with being one of the founders of non-Euclidean geometry). For discussions of an interesting feud between Gauss and Legendre over priority to the method of least squares, see Sorenson (1970).

Gauss died at 78 on February 23, 1855. As Bell says, “He lives everywhere in mathematics.”

Number of Measurements

Suppose that θ contains n parameters and Z(k) contains N measurements. If N < n, we have fewer measurements than unknowns and (3-1) is an underdetermined system of equations that does not lead to unique values for θ1, θ2,…, θn. If N = n, we have exactly as many measurements as unknowns, and as long as the n measurements are linearly independent, so that H–1 (k) exists, we can solve (3-1) for θ as

(3-3)

Image

Because we cannot measure V(k), it is usually neglected in the calculation of (3-3). For small amounts of noise this may not be a bad thing to do, but for even moderate amounts of noise this will be quite bad. Finally, if N > n, we have more measurements than unknowns, so (3-1) is an overdetermined system of equations. The extra measurements can be used to offset the effects of the noise; i.e., they let us “filter” the data. Only this last case is of real interest to us. Some discussions on the underdetermined case are given in Lesson 4.

Objective Function and Problem Statement

A direct approach for obtaining Image is to choose it so as to minimize the sum of the squared errors between its components and the respective components of θ, i.e., to minimize

Image

The solution to this minimization problem is Image = θ which, of course is a useless result, because, if we knew θ ahead of time, we would not need to estimate it.

A less direct approach for obtaining Image is based on minimizing the objective function

(3-4)

Image

where

(3-5)

Image

and weighting matrix W(k) must be symmetric and positive definite, for reasons explained later.

No general rules exist for how to choose W(k). The most common choice is a diagonal matrix such as

Image

When |μ| < 1 so that 1/μ > 1, recent errors (and associated measurements) [Image] are weighted more heavily than past ones [Image]. Such a choice for W(k) provides the weighted least-squares estimator with an “aging” or “forgetting” factor. When |μ| > 1, recent errors are weighted less heavily than past ones. Finally, if μ = 1, so that W(k) = I, then all errors are weighted by the same amount. When W(k) = I, Image, whereas for all other W(k), Image Note also that if W(k) = cI, where c is a constant, then Image (see Problem 3-2).

Our objective is to determine the Image that minimizes Image.

Derivation of Estimator

To begin, we express (3-4) as an explicit function of Image, using (3-2):

(3-6)

Image

Next, we take the vector derivative of Image with respect to Image, but before doing this recall from vector calculus, that:

If m and b are two n x 1 nonzero vectors, and A is an n x n symmetric matrix, then

(3-7)

Image

and

(3-8)

Image

Using these formulas, we find that

(3-9)

Image

Setting Image, we obtain the following formula for Image:

(3-10)

Image

Note, also, that

(3-11)

Image

By substituting (3-10) into (3-6), we obtain the minimum value of Image:

(3-12)

Image

Comments

1. Matrix H’(k)W(k)H(k) must be nonsingular for its inverse to exist. Matrix H’(k)W(k)H(k) is said to be nonsingular if it has an inverse satisfying

Image

If W(k) is positive definite, then it can be written uniquely as W(k) = L’(k)L(k), where L(k) is a lower triangular matrix with positive diagonal elements. Consequently, we can express H’(k)W(k)H(k) as

Image

If A (k) has linearly independent columns, i.e., is of maximum rank, then A′(k)A(k) is nonsingular. Finally, rank [L(k)H(k)] = rank[H(k)], because L(k) is nonsingular. Consequently, if H(k) is of maximum rank, then A′(k)A(k) is nonsingular, so [A′(k)A(k)]–1 = [H′(k)W(k)H(k)]–1 exists. The two conditions that have fallen out of this analysis are that W(k) must be positive definite and H(k) must be of maximum rank.

2. How do we know that Image minimizes Image We compute Image and see if it is positive definite [which is the vector calculus analog of the scalar calculus requirement that Image minimizes Image and Image is positive]. Doing this, we see that

Image

because H′(k)W(k)H(k) is invertible.

3. Estimator Image processes the measurements Z(k) linearly; thus, it is referred to as a linear estimator. It processes the data contained in H(k) in a very complicated and nonlinear manner.

4. When (3-9) is set equal to zero, we obtain the following system of normal equations:

(3-13)

Image

This is a system of n linear equations in the n components of Image.

In practice, we do not compute Image using (3-10), because computing the inverse of H′(k)W(k)H(k) is fraught with numerical difficulties. Instead, the normal equations are solved using stable algorithms from numerical linear algebra. Golub and Van Loan (1989) have an excellent chapter, entitled “Orthogonalization and Least Squares Methods,” devoted to numerically sound ways for computing Image from (3-13) (see, also, Stewart, 1973; Bierman, 1977; and Dongarra, et al., 1979). They state that “One tactic for solution [of (3-13)] is to convert the original least squares problem into an equivalent, easy-to-solve problem using orthogonal transformations. Algorithms of this type based on Householder and Givens transformations… compute the factorization H′(k)W(k)H(k) = Q(k)t)R(k), where Q(k) is orthogonal and R(k) is upper triangular.”

In Lesson 4 we describe how (3-10) can be computed using the very powerful singular-value decomposition (SVD) method. SVD can be used for both the overdetermined and underdetermined situations.

Based on this discussion, we must view (3-10) as a useful”theoretical” formula and not as a useful computational formula.

5. Using the fact that Image equation (3-13) can also be reexpressed as

(3-14)

Image

which can be viewed as an orthogonality condition between Image and W(k)H(k). Orthogonality conditions play an important role in estimation theory. We shall see many more examples of such conditions throughout this book. For a very lucid discussion on the least-squares orthogonality principle, see Therrien (1992, pp. 525-528). See, also, Problems 3-12 and 3-13.

6. Estimates obtained from (3-10) will be random! This is because Z(k) is random, and in some applications even H(k) is random. It is therefore instructive to view (3-10) as a complicated transformation of vectors or matrices of random variables into the vector of random variables Image In later lessons, when we examine the properties of Image, these will be statistical properties because of the random nature of Image.

7. The assumption that θ is deterministic was never made during our derivation of Image; hence, (3-10) and (3-11) also apply to the estimation of random parameters. We return to this important point in Lesson 13. If θ is random, then a performance analysis of Image is much more difficult than when θ is deterministic. See Lesson 8 for some performance analyses of Image.

EXAMPLE 3-1 (Mendel, 1973, pp. 86–87)

Suppose we wish to calibrate an instrument by making a series of uncorrelated measurements on a constant quantity. Denoting the constant quantity as θ, our measurement equation becomes

(3-15)

Image

where k = 1, 2,…, N. Collecting these N measurements, we have

(3-16)

Image

Clearly, H = col (1, 1,…, 1); henced

(3-17)

Image

which is the sample mean of the N measurements. We see, therefore, that the sample mean is a least-squares estimator.Image

EXAMPLE 3-2 (Mendel, 1973)

Figure 3-1 depicts simplified third-order pitch-plane dynamics for a typical, high-performance, aerodynamically controlled aerospace vehicle. Cross-coupling and body-bending effects are neglected. Normal acceleration control is considered with feedback on normal acceleration and angle-of-attack rate. Stefani (1967) shows that if the system gains are chosen as

Figure 3-1 Pitch-plane dynamics and nomenclature: Ni, input normal acceleration along the negative Z axis; KNi gain on Ni; δ, control-surface deflection; Mδ, control-surface effectiveness; Image rigid-body acceleration; α, angle of attack; Mα, aerodynamic moment effectiveness; Kα, control gain on α; Zα, normal acceleration force coefficient; μ, axial velocity; Nα, system-achieved normal acceleration along the negative Z axis; KNa, control gain on Na (reprinted from Mendel, 1973, p. 33, by courtesy of Marcel Dekker, Inc.).

Image

(3-18)

Image

(3-19)

Image

and

(3-20)

Image

then

(3-21)

Image

Stefani assumes Zα 1845/μ is relatively small, and chooses C1 = 1400 and C2 = 14, 000. The closed-loop response resembles that of a second-order system with a bandwidth of 2 Hz and a damping ratio of 0.6 that responds to a step command of input acceleration with zero steady-state error.

In general, Mα, Mδ, and Zα are dynamic parameters and all vary through a large range of values. Also, Mα may be positive (unstable vehicle) or negative (stable vehicle). System response must remain the same for all values of Mα, Mδ and Zα; thus, it is necessary to estimate these parameters so that KNi, Ka, and KNa can be adapted to keep C1 and C2 invariant at their designed values. For present purposes we shall assume that Mα, Mδ, and Zα are frozen at specific values.

From Figure 3-1,

(3-22)

Image

and

(3-23)

Image

Our attention is directed at the estimation of Mα and Mδ in (3-22). We leave it as an exercise for the reader to explore the estimation of Zα in (3-23).

Our approach will be to estimate Mα and Mδ from the equation

(3-24)

Image

where Image denotes the measured value of Image that is corrupted by measurement noise Image. We shall assume (somewhat unrealistically) that α(k) and δ(k) can both be measured perfectly. The concatenated measurement equation for N measurements is

(3-25)

Image

Hence, the least-squares estimates of Mα and Mδ are

(3-26)

Image

Fixed and Expanding Memory Estimators

Estimator Image uses the measurements z(kN + 1), z(kN + 2),…, z(k). When N is fixed ahead of time, Image uses a fixed window of measurements, a window of length N, and Image is then referred to as a fixed-memory estimator. The batch weighted least-squares estimator, obtained in this lesson, has a fixed memory.

A second approach for choosing N is to set it equal to k; then Image uses the measurements z(l), z(2),…, z(k). In this case, Image uses an expanding window of measurements, a window of length k, and Image is then referred to as an expanding-memory estimator. The recursive weighted least-squares estimator, obtained in Lesson 5, has an expanding memory.

Scale Changes and Normalization of Data

Least-squares (LS) estimates may not be invariant under changes of scale. One way to circumvent this difficulty is to use normalized data.

As long as the elements of Z(k), the z(kj)’s (j = 0, 1,…, N – 1), are scalars, there is no problem with changes in scale. For example, if measurements of velocity are in miles per hour or are scaled to feet per second, we obtain the same weighted least-squares or unweighted least-squares estimates. Reasons for this are explored in Problems 3-7 and 3-8. If, on the other hand, the elements of Z(k), the z(kj)’s (j = 0, 1,…, N – 1), are vectors (see the section in Lesson 5 entitled “Generalization to Vector Measurements”), then scaling of measurements can be a serious problem, as we now demonstrate.

Assume that observers A and B are observing a process; but observer A reads the measurements in one set of units and B in another. Let M be a diagonal matrix of scale factors relating A to B; ZA(k) and ZB(k) denote the total measurement vectors of A and B, respectively. Then

(3-27)

Image

which means that

(3-28)

Image

Let Image and Image denote the WLSEs associated with observers A and B, respectively, where Image and Image or

(3-29)

Image

It seems a bit peculiar though to have different weighting matrices for the two WLSEs. In fact, if we begin with Image then it is impossible to obtain Image such thatImage. The reason for this is simple. To obtain Image, we set WA(k) = I, in which case (3-29) reduces to WB(k) = (M–1)2I.

Next, let NA and NB denote diagonal normalization matrices for ZA(k) and ZB(k), respectively. We shall assume that our data is always normalized to the same set of numbers, i.e., that

(3-30)

Image

Observe that

(3-31)

Image

and

(3-32)

Image

From (3-30), (3-31), and (3-32), we see that

(3-33)

Image

We now find that

(3-34)

Image

and

(3-35)

Image

Substituting (3-33) into (3-35), we then find

(3-36)

Image

Comparing (3-36) and (3-34), we conclude that Image if WB(k) = WA(k). This is precisely the result we were looking for. It means that, under proper normalization, Image and, as a special case, Image.

One way to normalize data is to divide all values by the maximum value of the data. Another way is to work with percentage values, since a percentage value is a ratio of numerator to denominator quantities that have the same units; hence, percentage value is unitless.

Computation

See Lesson 4 for how to compute Image or Image using the singular-value decomposition and the pseudoinverse.

Total least squares estimates, which are described in the Supplementary Material at the end of this lesson, can be computed using the following M-file in Hi-Spec:

tls: Total least squares solution to an overdetermined system of linear equations.

Supplementary Material Least Squares, Total Least Squares, and Constrained Total Least Squares

Consider the overdetermined linear system of equations

(3-37)

Image

which must be solved for x. Numerical linear algebra specialists (e.g., Stewart, 1973, and Golub and Van Loan, 1989) provide the following very interesting interpretation for the least-squares solution to (3-37). Suppose there are “errors” associated with the numbers that are entered into the vector b, i.e., b → b + Δ b, in which case (3-37) actually is

(3-38)

Image

This equation can now be expressed as our generic linear model, as

(3-39)

Image

Here b, A, x, and –Δb play the roles of Z(k), H(k), θ, and V(k), respectively. In our derivation of the least-squares estimator, we minimized Image, which, in the notation of (3-39) means that we minimized [bAx]′[bAx] = Δb′Δb; i.e., the least-squares solution of (3-38) finds the x that minimizes the errors in the vector b.

Many signal-processing problems lead to a linear system of equations like (3-37). In these problems the elements of b must first be estimated directly from the data; e.g., they could be autocorrelations, cross-correlations, higher-order statistics, etc. As such, the estimated statistics are themselves in error; hence, in these problems there usually is an error associated with b. Unfortunately, in these same problems there usually are also errors associated with the numbers entered into matrix A (see Problem 2-8). These elements may also be autocorrelations or higher-order statistics; hence, in many signal-processing problems (3-37) actually becomes

(3-40)

Image

Solving for x using least squares ignores the errors in A.

Equation (3-40) can be reexpressed as

(3-41)

Image

Golub and Van Loan (1980, 1989) determined a solution x of (3-41) that minimizes the Frobenius norm of (ΔA|Δb). [Note that the Frobenius norm of the real L x P matrix M is defined by

Image

They called this the “total least squares (TLS)” solution, XTLS. Their solution involves the singular-value decomposition of (ΔA|Δb). Van Huffel and Vandewalle (1991) provide a comprehensive treatise on TLS.

One major assumption made in TLS is that the errors in A are independent, as are the errors in b; i.e., the elements of ΔA, ΔAij are totally independent, as are the elements of Δb, Δbi. Unfortunately, this is not the situation in most signal-processing problems, where matrix A has a specific structure (e.g., Toeplitz, block Toeplitz, Hankel, block Hankel, and circulant). Additionally, b may also contain elements that also appear in A. Consequently, the errors ΔA and Δb are not usually independent in signal-processing applications. Yet, TLS is widely used in such applications and often (if not always) gives much better results than does least squares. Apparently, doing something to account for errors in both A and b is better than only accounting for errors in b.

Abatzoglou, Mendel, and Harada (1991) developed a variation of TLS that is appropriate for the just described situation in which elements of ΔA and Δb are dependent. The dependency can be represented as linear constraints among the elements of ΔA and Δb. They determine a solution x to (3-41) that in essence minimizes a norm of (ΔA|Δb) subject to linear constraints between the elements of ΔA and Δb. They called this the “constrained total least squares (CTLS)” solution XCTLS. Their solution requires mathematical programming; hence, it is computationally more intensive than the TLS solution; but examples demonstrate that it outperforms TLS.

When the signal-to-noise ratio is high, it is frequently true that elements of ΔA and Δb are very small, in which case XLS, XTLS, and XCTLS give essentially the same results. When the signal-to-noise ratio is low so that elements of ΔA and Δb are large, there will be a significant payoff when using TLS or CTLS.

Summary Questions

1. The method of least squares is credited to:

(a) Lagrange

(b) Rayleigh

(c) Gauss

2. A weighted least-squares estimator reduces to a least-squares estimator when:

(a) past measurements are weighted more heavily than present measurements

(b) past measurements are weighted the same as present measurements

(c) past measurements are weighted less heavily than present measurements

3. The normal equations:

(a) should be programmed for solution using a matrix inversion routine

(b) should be solved using Gaussian elimination

(c) should be solved using stable algorithms from numerical linear algebra that use orthogonal transformations

4. When N is set equal to k, then Image is known as a:

(a) fixed-memory estimator

(b) expanding-memory estimator

(c) recursive estimator

5. Least-squares estimates may not be invariant under scale change. One way to circumvent this difficulty is to use:

(a) normalized data

(b) squared data

(c) redundant data

6. Let NA and NB denote symmetric normalization matrices for ZA(k) and ZB(k), respectively. The condition that our data are always normalized to the same set of numbers is:

(a) NA ZA(k) = NB ZB(k)

(b) ZA (k)NB = NA ZB (k)

(c) ZA(k)NA = NB ZB(k)

7. Weighted least-squares estimates are:

(a) deterministic

(b) random

(c) a mixture of deterministic and random

8. Gauss was worried that his discovery of the method of least squares would be credited to:

(a) Laplace

(b) Legendre

(c) Lagrange

9. Consider the equation Ax = b. Which of the following statements are true?

(a) least-squares is associated with accounting for the errors in A, that is, ΔA

(b) least-squares is associated with accounting for the errors in b, that is, Δb

(c) TLS accounts for errors in both A and b, but it assumes all elements of ΔA are independent, as are all elements of Δb

(d) TLS accounts for errors in both A and b, but it assumes some elements of ΔA are dependent, as are some elements of Δb

(e) CTLS accounts for errors in both A and b, and it assumes elements of ΔA are nonlinearly related, as are some elements of Δb

(f) CTLS accounts for errors in both A and b, and it assumes elements of ΔA are linearly related, as are some elements of Δb

(g) TLS solution XTLS can be obtained using singular-value decomposition

Problems

3-1. Derive the formula for Image by completing the square on the right-hand side of the expression for Image in (3-6). In the calculus approach to deriving Image, described in the text, invertibility of H′(k)W(k)H(k) is needed in order to solve (3-9) for Image. Is this condition still needed in this problem’s algebraic derivation? If so, how does it occur in the algebraic derivation of Image? What is the main advantage of the algebraic derivation over the calculus derivation?

3-2. In the text we stated that Image is obtained when the weighting matrix is chosen as W(k) = I. More generally, Image is obtained when the weighting matrix is chosen as W(k) = cI, where c is a constant. Verify the truth of this statement. What happens to JImage in (3-12), and is it important?

3-3. (Prof. G. B. Giannakis) Show that weighted least squares can be interpreted as least squares applied to filtered versions of Z(k) and H(k). [Hint: Decompose W(k) as W(k) = L′(k)D(k)L(k), where L(k) is lower triangular and D(k) is diagonal.] What is the “filter”? What happens if W(k) is itself diagonal?

3-4. Here we explore the estimation of Z in (3-23). Assume that N noisy measurements of Na (k) are available, i.e., Image. What is the formula for the least-squares estimator of Zα?

3-5. Here we explore the simultaneous estimation of Mα, Mδ, and Zα in (3-22) and (3-23). Assume that N noisy measurements of Image are available, i.e.,Image Determine the least-squares estimator of Mα, Mδ, and Zα. Is this estimator different from Image and Image obtained just from Image measurements and Image obtained just from Image measurements?

3-6. In a curve-fitting problem we wish to fit a given set of data z(l), z(2),…, z(N) by the approximating function (see, also, Example 2-3)

Image

where φj (k) (j = 1, 2,…, n) are a set of prespecified basis functions.

(a) Obtain a formula for Image that is valid for any set of basis functions.

(b) The simplest approximating function to a set of data is the straight line. In this case, Image, which is known as the least-squares or regression line. Obtain closed-form formulas for Image.

3-7. Suppose z(k) = θl + θ2k, where z(1) = 3 miles per hour and z(2) = 7 miles per hour. Determine Image based on these two measurements. Next, redo these calculations by scaling z(1) and z(2) to the units of feet per second. Are the least-squares estimates obtained from these two calculations the same? Use the results developed in the section entitled “Scale Changes and Normalization of Data” to explain what has happened here.

3-8. (a) Under what conditions on scaling matrix M is scale invariance preserved for a least-squares estimator?

(b) If our original model is nonlinear in the measurements [e.g., z(k) = θz2 (k - 1) + v(k)], can anything be done to obtain invariant WLSEs under scaling?

3-9. (Tony Hung-yao Wu, Spring 1992) Sometimes we need to fit data that are exponential in nature. This requires the approximating function to be of the form y = beax. An easy way to fit this model to data is to work with the logarithm of the model, i.e., ln y = In b + ax. Obtain least-squares estimates for a and b for the following data:

i

x(i)

y(i)

i

1.00

5.10

2

1.25

5.79

3

1.50

6.53

4

1.75

7.45

5

2.00

8.46

3-10. (Ryuji Maeda, Spring 1992) In an experiment to determine Planck's constant h using the photoelectric effect, the following linear relationship is obtained between retarding potential V and frequency of the incident light v:

Image

where e is the charge of an electron (assume its value is known), nv is the measurement noise associated with measuring V, and hv0 is the work function.

(a) The following measurements are made: V1,…, VN, and v1,…, VN. Derive the formula for Image.

(b) Suppose the following data have been obtained from an experiment in which sodium is used as the “target”: V0 = 4.4 x 1014 Hz and e = 1.6 x 10-19.

v x 10-14 (Hz)

V (V)

6.9

1.1

9.0

1.8

12.1

3.1

Find the numerical value of Image for these data.

3-11. (Liang-Jin Lin, Spring 1992) Supose that Y is modeled as a quadratic function of X, i.e.,

Y(X) = aX2 + bX + c

and the measurements of Y, denoted Z, are subject to measurement errors.

(a) Given a set of measurements {X(i), Z(i)}, i = 1,2,…,N, determine a formula for col Image.

(b) Calculate col Image for the following data. (Note: The actual values of parameters a, b, and c used to generate these data are 0.5, 1.5, and 3, respectively.)

Image

3-12. Let H(k) be expressed as H(k) = (h1(k)|h2(k)|… |hn(k)).

(a) Show that, for least-squares estimates,

Image

(b) Explain why Image lies in the subspace defined by the n vectors h1(k), h2 (k),…, hn(k).

(c) Does Z(k) lie in Image’s subspace?

(d) Provide a three-dimensional diagram to clarify parts (b) and (c) for n = 2.

(e) Explain why Image is the orthogonal projection of z(k) onto the subspace spanned by the hi(k)’s. Use your diagram from part (d) to do this.

(f) How do the results from parts (a)-(e) assist your understanding of least-squares estimation?

3-13. The derivation of the least-squares estimator makes extensive use of the approximation of Z(k) as Image = H(k)Image.

(a) Show that Image can be expressed as Image = PH(k) (k)Z(k), where the matrix PH(k)(k) is called a projection matrix. Why is PH(k)(k) called a projection matrix? (Hint: See Problem 3-12.)

(b) Prove that PH(k)(k) is an idempotent matrix.

(c) Showthat Image Matrix Image is associated with the complementary (orthogonal) subspace to H(k) (see Problem 3-12). Prove that Image is also idempotent.

(d) Prove that Image

(e) How do the results from parts (a) - (d) further assist your understanding of least-squares estimation?

3-14. Consider the generic model in (2-1) subject to the restrictions (constraints) Cθ = r, where C is a J x n matrix of known constants and is of rank J, and r is a vector of known constants. This is known as the restricted least-squares problem (Fomby et al., 1984). Let Image denote the restricted least-squares estimator of θ. Show that

Image

(Hint: Use the method of Lagrange multipliers.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.41.214