Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 14
Further topics in the linear model

1 INTRODUCTION

In Chapter 13, we derived the ‘best’ affine unbiased estimator of β in the linear regression model (y, Xβ, σ²V) under various assumptions about the ranks of X and V. In this chapter, we discuss some other topics relating to the linear model.

Sections 14.2–14.7 are devoted to constructing the ‘best’ quadratic estimator of σ². The multivariate analog is discussed in Section 14.8. The estimator

(1)

known as the least‐squares (LS) estimator of σ², is the best quadratic unbiased estimator in the model (y, Xβ, σ²I). But if var(y) ≠ σ²I_n, then in (1) will, in general, be biased. Bounds for this bias which do not depend on X are obtained in Sections 14.9 and 14.10.

The statistical analysis of the disturbances ɛ = y − Xβ is taken up in Sections 14.11–14.14, where predictors that are best linear unbiased with scalar variance matrix (BLUS) and best linear unbiased with fixed variance matrix (BLUF) are derived.

Finally, we show how matrix differential calculus can be useful in sensitivity analysis. In particular, we study the sensitivities of the posterior moments of β in a Bayesian framework.

2 BEST QUADRATIC UNBIASED ESTIMATION OF σ²

Let (y, Xβ, σ²V) be the linear regression model. In the previous chapter, we considered the estimation of β as a linear function of the observation vector y. Since the variance σ² is a quadratic concept, we now consider the estimation of σ² as a quadratic function of y, that is, a function of the form

(2)

where A is nonstochastic and symmetric. Any estimator satisfying (2) is called a quadratic estimator.

If, in addition, the matrix A is positive (semi)definite and AV ≠ 0, and if y is a continuous random vector, then Pr(y′Ay > 0) = 1, and we say that the estimator is quadratic and positive (almost surely).

An unbiased estimator of σ² is an estimator, say , such that

(3)

In (3), it is implicitly assumed that β and σ² are not restricted (for example, by Rβ = r) apart from the requirement that σ² is positive.

We now propose the following definition.

3 THE BEST QUADRATIC AND POSITIVE UNBIASED ESTIMATOR OF σ²

Our first result is the following well‐known theorem.

Proof

We consider a quadratic estimator of y′Ay. To ensure that the estimator is positive, we write A = C′C. The problem is to determine an n × n matrix C such that y′C′Cy is unbiased and has the smallest variance in the class of unbiased estimators.

Unbiasedness requires

that is,

This leads to the conditions

Given the condition CX = 0, we can write

where ɛ ˜ N(0, σ²I_n), and hence, by Theorem 12.12,

Our optimization problem thus becomes

To solve (4), we form the Lagrangian function

where λ is a Lagrange multiplier and L is a matrix of Lagrange multipliers. Differentiating ψ gives

so that we obtain as our first‐order conditions

(5)

(6)

(7)

Premultiplying (5) with XX⁺ and using (7) gives

(8)

Inserting (8) in ( 5 ) gives

(9)

Condition (9) implies that λ > 0. Also, defining

we obtain from (6), ( 7 ), and ( 9 ),

(10)

(11)

(12)

Hence, B is an idempotent symmetric matrix. Now, since by ( 6 ) and ( 9 )

it appears that we must choose λ as small as possible, that is, we must choose the rank of B as large as possible. The only constraint on the rank of B is (12), which implies that

(13)

where r is the rank of X. Since we want to maximize r(B), we take

From (10), ( 12 ), and (13) we find, using Theorem 2.9,

and hence

The result follows.

4 THE BEST QUADRATIC UNBIASED ESTIMATOR OF σ²

The estimator obtained in the preceding section is, in fact, the best in a wider class of estimators: the class of quadratic unbiased estimators. In other words, the constraint that be positive is not binding. We thus obtain the following generalization of Theorem 14.1.

Proof

Let be the quadratic estimator of σ², and let ɛ = y − Xβ ∼ N(0, σ²I_n). Then,

so that is an unbiased estimator of σ² for all β and σ² if and only if

The variance of is

where γ = β/σ. Hence, the optimization problem becomes

(14)

We notice that the function to be minimized in (14) depends on γ so that we would expect the optimal value of A to depend on γ as well. This, however, turns out not to be the case. We form the Lagrangian (taking into account the symmetry of A, see Section 3.8)

where λ is a Lagrange multiplier and L is a matrix of Lagrange multipliers. Since the constraint function X′AX is symmetric, we may take L to be symmetric too (see Exercise 2 in Section 17.7).

Differentiating ψ gives

so that the first‐order conditions are:

(15)

(16)

(17)

Pre‐ and postmultiplying (15) with XX⁺ gives, in view of (16),

(18)

Inserting (18) in ( 15 ), we obtain

(19)

where

Since tr P = 0, because of ( 16 ), we have tr A = λ tr M, and hence

(20)

Also, since MP + PM = P, we obtain

so that

The objective function in ( 14 ) can now be written as

which is minimized for AXγ = 0, that is, for P = 0. Inserting P = 0 in (19) and using (20) gives

thus concluding the proof.

5 BEST QUADRATIC INVARIANT ESTIMATION OF σ²

Unbiasedness, though a useful property for linear estimators in linear models, is somewhat suspect for nonlinear estimators. Another, perhaps more useful, criterion is invariance. In the context of the linear regression model

(21)

let us consider, instead of β, a translation β − β₀. Then (21) is equivalent to

and we say that the estimator y′Ay is invariant under translation of β if

This is the case if and only if

(22)

We can obtain (22) in another, though closely related, way if we assume that the disturbance vector ɛ is normally distributed, ɛ ∼ N(0, σ²V), V positive definite. Then, by Theorem 12.12,

and

so that, under normality, the distribution of y′Ay is independent of β if and only if AX = 0.

If the estimator is biased, we replace the minimum variance criterion by the minimum mean squared error criterion. Thus, we obtain Definition 14.2.

6 THE BEST QUADRATIC AND POSITIVE INVARIANT ESTIMATOR OF σ²

Given invariance instead of unbiasedness, we obtain Theorem 14.3 instead of Theorem 14.1.

Proof

Again, let be the quadratic estimator of σ² and write A = C′C. Invariance requires C′CX = 0, that is, CX = 0. Letting ɛ = y − Xβ ˜ N(0, σ²I_n), the estimator for σ² can be written as

so that the mean squared error becomes

The minimization problem is thus

The Lagrangian is

where L is a matrix of Lagrange multipliers, leading to the first‐order conditions

(23)

(24)

Premultiplying both sides of (23) with XX⁺ gives, in view of (24),

(25)

Inserting (25) in ( 23 ) gives

(26)

Now define

(27)

(Notice that tr C′C ≠ 1 (why?).) Then, from ( 24 ) and (26),

(28)

(29)

We also obtain from (27),

(30)

Let ρ denote the rank of B. Then tr B = tr B² = ρ and hence

(31)

The left‐hand side of (31) is the function we wish to minimize. Therefore, we must choose ρ as large as possible, and hence, in view of (29),

(32)

From (28), ( 29 ), and (32) we find, using Theorem 2.9, B = I_n − XX⁺, and hence

This concludes the proof.

7 THE BEST QUADRATIC INVARIANT ESTIMATOR OF σ²

A generalization of Theorem 14.2 is obtained by dropping the requirement that the quadratic estimator of σ² be positive. In this wider class of estimators, we find that the estimator of Theorem 14.3 is again the best (smallest mean squared error), thus showing that the requirement of positiveness is not binding.

Comparing Theorems 14.2 and 14.4, we see that the best quadratic invariant estimator has a larger bias (it underestimates σ²) but a smaller variance than the best quadratic unbiased estimator, and altogether a smaller mean squared error.

8 BEST QUADRATIC UNBIASED ESTIMATION: MULTIVARIATE NORMAL CASE

Extending Definition 14.1 to the multivariate case, we obtain Definition 14.3.

Proof

Consider a quadratic estimator Y′AY. From Chapter 12 (Miscellaneous Exercise 2), we know that

and

The estimator Y′AY is unbiased if and only if

that is,

Let T ≠ 0 be an arbitrary m × m matrix and let . Then,

where

Consider now the optimization problem minimize

(36)

where α and β are fixed numbers. If the optimal matrix A, which minimizes (36) subject to the constraint, does not depend on α and β — and this will turn out to be the case — then this matrix A must be the best quadratic unbiased estimator according to Definition 14.3.

Define the Lagrangian function

where λ₁ and λ₂ are Lagrange multipliers. Differentiating ψ gives

(37)

Since the matrix in square brackets in (37) is symmetric, we do not have to impose the symmetry condition on A. Thus, we find the first‐order conditions

(38)

(39)

(40)

Taking the trace in (38) yields 2α = n(λ₁ + λ₂). Premultiplying ( 38 ) with ι′ and postmultiplying with ι gives

(41)

Hence,

Postmultiplying ( 38 ) with ι gives, in view of (41),

Since α > 0 (why?) and β ≥ 0, we obtain Aι = 0, and hence

As the objective function in ( 36 ) is strictly convex, this solution provides the required minimum.

9 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ², I

Let us again consider the linear regression model (y, Xβ, σ²V), where X has full column rank k and V is positive semidefinite.

If V = I_n, then we know from Theorem 14.2 that

(42)

is the best quadratic unbiased estimator of σ², also known as the least‐squares (LS) estimator of σ². If V ≠ I_n, then (42) is no longer an unbiased estimator of σ², because, in general,

If both V and X are known, we can calculate the relative bias

exactly. Here, we are concerned with the case where V is known (at least in structure, say first‐order autocorrelation), while X is not known. Of course we cannot calculate the exact relative bias in this case. We can, however, find a lower and an upper bound for the relative bias of over all possible values of X.

10 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ², II

Suppose now that X is not completely unknown. In particular, suppose that the regression contains a constant term, so that X contains a column of ones. Surely this additional information must lead to a tighter interval for the relative bias of . Theorem 14.7 shows that this is indeed the case. Somewhat surprisingly perhaps only the upper bound of the relative bias is affected, not the lower bound.

11 THE PREDICTION OF DISTURBANCES

Let us write the linear regression model (y, Xβ, σ²I_n) as

We have seen how the unknown parameters β and σ² can be optimally estimated by linear or quadratic functions of y. We now turn our attention to the ‘estimation’ of the disturbance vector ɛ. Since ɛ (unlike β) is a random vector, it cannot, strictly speaking, be estimated. Furthermore, ɛ (unlike y) is unobservable.

If we try to find an observable random vector, say e, which approximates the unobservable ɛ as closely as possible in some sense, it is appealing to minimize

(43)

subject to the constraints

(44)

(45)

This leads to the best linear unbiased predictor of ɛ,

(46)

which we recognize as the least‐squares residual vector (see Exercises 1 and 2).

A major drawback of the best linear unbiased predictor given in (46) is that its variance matrix is not scalar. In fact,

whereas the variance matrix of ɛ, which e hopes to resemble, is σ²I_n. This drawback is especially serious if we wish to use e in testing the hypothesis var(ɛ) = σ²I_n.

For this reason we wish to find a predictor of ɛ (or more generally, Sɛ) which, in addition to being linear and unbiased, has a scalar variance matrix.

Exercises

1. Show that the minimization problem (43) subject to (44) and (45) amounts to
2. Solve this problem and show that is the constrained minimizer.
3. Show that, while ɛ is unobservable, certain linear combinations of ɛ are observable. In fact, show that c′ɛ is observable if and only if X′c = 0, in which case c′ɛ = c′y.

12 BEST LINEAR UNBIASED PREDICTORS WITH SCALAR VARIANCE MATRIX

Thus motivated, we propose the following definition of the predictor of Sɛ, that is best linear unbiased with scalar variance matrix (BLUS).

Proof

We seek a linear predictor w of Sɛ, that is a predictor of the form w = Ay, where A is a constant m × n matrix. Unbiasedness of the prediction error requires

which yields

(47)

The variance matrix of w is

In order to satisfy condition (iii) of Definition 14.4, we thus require

(48)

Under the constraints (47) and (48), the prediction error variance is

(49)

Hence, the BLUS predictor of Sɛ is obtained by minimizing the trace of (49) with respect to A subject to the constraints ( 47 ) and ( 48 ). This amounts to solving the problem

(50)

We define the Lagrangian function

where L₁ and L₂ are matrices of Lagrange multipliers and L₂ is symmetric. Differentiating ψ with respect to A yields

The first‐order conditions are

(51)

(52)

(53)

Premultiplying (51) with XX⁺ yields

(54)

because X⁺A′ = 0 in view of (52). Inserting (54) in ( 51 ) gives

(55)

Also, premultiplying ( 51 ) with A gives

(56)

in view of ( 52 ) and (53) and the symmetry of L₂. Premultiplying (55) with S and using (56), we find , and hence

(57)

It follows from (50) and ( 56 ) that our objective is to maximize the trace of L₂. Therefore, we must choose in (57) the (positive definite) square root of SMS′. Inserting (57) in ( 55 ) yields

The result follows.

13 BEST LINEAR UNBIASED PREDICTORS WITH FIXED VARIANCE MATRIX, I

We can generalize the BLUS approach in two directions. First, we may assume that the variance matrix of the linear unbiased predictor is not scalar, but some fixed known positive semidefinite matrix, say Ω. This is useful, because for many purposes the requirement that the variance matrix of the predictor is scalar is unnecessary; it is sufficient that the variance matrix does not depend on X.

Second, we may wish to generalize the criterion function to

where Q is some given positive definite matrix.

Proof

Proceeding as in the proof of Theorem 14.8, we seek a linear predictor Ay of Sɛ such that tr var(Ay − Sɛ) is minimized subject to the conditions

and

This leads to the maximization problem maximize

(58)

The first‐order conditions are

(59)

(60)

(61)

where L₁ and L₂ are matrices of Lagrange multipliers and L₂ is symmetric. Premultiplying (59) with XX⁺ and A, respectively, yields

(62)

and

(63)

in view of (60) and (61). Inserting (62) in ( 59 ) gives

(64)

Hence,

using (64), (63), and the fact that Ω = PP′. This gives

and hence

(65)

By assumption, the matrix P′SMS′P is positive definite. Also, it follows from (58) and ( 63 ) that we must maximize the trace of P′L₂P, so that we must choose the (positive definite) square root of P′SMS′P.

So far the proof is very similar to the proof of Theorem 14.8. However, contrary to that proof we now cannot obtain A directly from ( 64 ) and (65). Instead, we proceed as follows. From ( 64 ), ( 61 ), and ( 65 ), we have

(66)

The general solution for A in (66) is

(67)

where Q is an arbitrary m × n matrix. From (67), we obtain

and hence, in view of ( 61 ),

(68)

Since the matrix in the middle is idempotent, (68) implies

and hence, from ( 67 ),

This concludes the proof.

14 BEST LINEAR UNBIASED PREDICTORS WITH FIXED VARIANCE MATRIX, II

Let us now present the full generalization of Theorem 14.8.

15 LOCAL SENSITIVITY OF THE POSTERIOR MEAN

Let y = Xβ + ɛ be the normal linear regression model with Eɛ = 0 and Eɛɛ′ = V, where V is positive definite. Suppose now that there is prior information concerning β:

Then, as Leamer (1978, p. 76) shows, the posterior distribution of β is

with

(70)

and

(71)

We are interested in the effects of small changes in the precision matrix V⁻¹, the design matrix X, and the prior moments b^* and H^*−1 on the posterior mean b and the posterior precision H⁻¹.

We first study the effects on the posterior mean.

Exercise

1. Show that the local sensitivity with respect to X of the least‐squares estimator b = (X′X)⁻¹X′y is given by

16 LOCAL SENSITIVITY OF THE POSTERIOR PRECISION

In precisely the same manner we can obtain the local sensitivity of the posterior precision.

BIBLIOGRAPHICAL NOTES

2–7. See Theil and Schweitzer (1961), Theil (1971), Rao (1971b), and also Neudecker (1980a).

8. See Balestra (1973), Neudecker (1980b, 1985b), Neudecker and Liu (1993), and Rolle (1994).

9–10. See Neudecker (1977b). Theorem 14.7 corrects an error in Neudecker (1978).

11–14. See Theil (1965), Koerts and Abrahamse (1969), Abrahamse and Koerts (1971), Dubbelman, Abrahamse, and Koerts (1972), and Neudecker (1973, 1977a).

15–16. See Leamer (1978) and Polasek (1986).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

(i)	(linearity)	w = Ay	for some m × n matrix A,
(ii)	(unbiasedness)	E(w − Sɛ) = 0	for all β,
(iii)	(scalar variance matrix)	var(w) = σ²I_m.

(i)	(linearity)	w = Ay	for some m × n matrix A,
(ii)	(unbiasedness)	E(w − Sɛ) = 0	for all β,
(iii)	(fixed variance)	var(w) = σ²Ω.

Table of Contents for Chapter 14: Further topics in the linear model

Create new playlist

Sign In

Sign Up

1 INTRODUCTION

2 BEST QUADRATIC UNBIASED ESTIMATION OF σ2

3 THE BEST QUADRATIC AND POSITIVE UNBIASED ESTIMATOR OF σ2

4 THE BEST QUADRATIC UNBIASED ESTIMATOR OF σ2

5 BEST QUADRATIC INVARIANT ESTIMATION OF σ2

6 THE BEST QUADRATIC AND POSITIVE INVARIANT ESTIMATOR OF σ2

7 THE BEST QUADRATIC INVARIANT ESTIMATOR OF σ2

8 BEST QUADRATIC UNBIASED ESTIMATION: MULTIVARIATE NORMAL CASE

9 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ2, I

10 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ2, II

11 THE PREDICTION OF DISTURBANCES

Exercises

12 BEST LINEAR UNBIASED PREDICTORS WITH SCALAR VARIANCE MATRIX

13 BEST LINEAR UNBIASED PREDICTORS WITH FIXED VARIANCE MATRIX, I

14 BEST LINEAR UNBIASED PREDICTORS WITH FIXED VARIANCE MATRIX, II

15 LOCAL SENSITIVITY OF THE POSTERIOR MEAN

Exercise

16 LOCAL SENSITIVITY OF THE POSTERIOR PRECISION

BIBLIOGRAPHICAL NOTES

Table of Contents for
Chapter 14: Further topics in the linear model

2 BEST QUADRATIC UNBIASED ESTIMATION OF σ²

3 THE BEST QUADRATIC AND POSITIVE UNBIASED ESTIMATOR OF σ²

4 THE BEST QUADRATIC UNBIASED ESTIMATOR OF σ²

5 BEST QUADRATIC INVARIANT ESTIMATION OF σ²

6 THE BEST QUADRATIC AND POSITIVE INVARIANT ESTIMATOR OF σ²

7 THE BEST QUADRATIC INVARIANT ESTIMATOR OF σ²

9 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ², I

10 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ², II