Chapter 14
Further topics in the linear model

1 INTRODUCTION

In Chapter 13, we derived the ‘best’ affine unbiased estimator of β in the linear regression model (y, , σ2V) under various assumptions about the ranks of X and V. In this chapter, we discuss some other topics relating to the linear model.

Sections 14.2–14.7 are devoted to constructing the ‘best’ quadratic estimator of σ2. The multivariate analog is discussed in Section 14.8. The estimator

(1)equation

known as the least‐squares (LS) estimator of σ2, is the best quadratic unbiased estimator in the model (y, , σ2I). But if var(y) ≠ σ2In, then images in (1) will, in general, be biased. Bounds for this bias which do not depend on X are obtained in Sections 14.9 and 14.10.

The statistical analysis of the disturbances ɛ = y is taken up in Sections 14.11–14.14, where predictors that are best linear unbiased with scalar variance matrix (BLUS) and best linear unbiased with fixed variance matrix (BLUF) are derived.

Finally, we show how matrix differential calculus can be useful in sensitivity analysis. In particular, we study the sensitivities of the posterior moments of β in a Bayesian framework.

2 BEST QUADRATIC UNBIASED ESTIMATION OF σ2

Let (y, Xβ, σ2V) be the linear regression model. In the previous chapter, we considered the estimation of β as a linear function of the observation vector y. Since the variance σ2 is a quadratic concept, we now consider the estimation of σ2 as a quadratic function of y, that is, a function of the form

(2)equation

where A is nonstochastic and symmetric. Any estimator satisfying (2) is called a quadratic estimator.

If, in addition, the matrix A is positive (semi)definite and AV ≠ 0, and if y is a continuous random vector, then Pr(yAy > 0) = 1, and we say that the estimator is quadratic and positive (almost surely).

An unbiased estimator of σ2 is an estimator, say images, such that

(3)equation

In (3), it is implicitly assumed that β and σ2 are not restricted (for example, by = r) apart from the requirement that σ2 is positive.

We now propose the following definition.

3 THE BEST QUADRATIC AND POSITIVE UNBIASED ESTIMATOR OF σ2

Our first result is the following well‐known theorem.

4 THE BEST QUADRATIC UNBIASED ESTIMATOR OF σ2

The estimator obtained in the preceding section is, in fact, the best in a wider class of estimators: the class of quadratic unbiased estimators. In other words, the constraint that images be positive is not binding. We thus obtain the following generalization of Theorem 14.1.

5 BEST QUADRATIC INVARIANT ESTIMATION OF σ2

Unbiasedness, though a useful property for linear estimators in linear models, is somewhat suspect for nonlinear estimators. Another, perhaps more useful, criterion is invariance. In the context of the linear regression model

(21)equation

let us consider, instead of β, a translation ββ0. Then (21) is equivalent to

equation

and we say that the estimator yAy is invariant under translation of β if

equation

This is the case if and only if

(22)equation

We can obtain (22) in another, though closely related, way if we assume that the disturbance vector ɛ is normally distributed, ɛ ∼ N(0, σ2V), V positive definite. Then, by Theorem 12.12,

equation

and

equation

so that, under normality, the distribution of yAy is independent of β if and only if AX = 0.

If the estimator is biased, we replace the minimum variance criterion by the minimum mean squared error criterion. Thus, we obtain Definition 14.2.

6 THE BEST QUADRATIC AND POSITIVE INVARIANT ESTIMATOR OF σ2

Given invariance instead of unbiasedness, we obtain Theorem 14.3 instead of Theorem 14.1.

7 THE BEST QUADRATIC INVARIANT ESTIMATOR OF σ2

A generalization of Theorem 14.2 is obtained by dropping the requirement that the quadratic estimator of σ2 be positive. In this wider class of estimators, we find that the estimator of Theorem 14.3 is again the best (smallest mean squared error), thus showing that the requirement of positiveness is not binding.

Comparing Theorems 14.2 and 14.4, we see that the best quadratic invariant estimator has a larger bias (it underestimates σ2) but a smaller variance than the best quadratic unbiased estimator, and altogether a smaller mean squared error.

8 BEST QUADRATIC UNBIASED ESTIMATION: MULTIVARIATE NORMAL CASE

Extending Definition 14.1 to the multivariate case, we obtain Definition 14.3.

9 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ2, I

Let us again consider the linear regression model (y, , σ2V), where X has full column rank k and V is positive semidefinite.

If V = In, then we know from Theorem 14.2 that

(42)equation

is the best quadratic unbiased estimator of σ2, also known as the least‐squares (LS) estimator of σ2. If VIn, then (42) is no longer an unbiased estimator of σ2, because, in general,

equation

If both V and X are known, we can calculate the relative bias

equation

exactly. Here, we are concerned with the case where V is known (at least in structure, say first‐order autocorrelation), while X is not known. Of course we cannot calculate the exact relative bias in this case. We can, however, find a lower and an upper bound for the relative bias of images over all possible values of X.

10 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ2, II

Suppose now that X is not completely unknown. In particular, suppose that the regression contains a constant term, so that X contains a column of ones. Surely this additional information must lead to a tighter interval for the relative bias of images. Theorem 14.7 shows that this is indeed the case. Somewhat surprisingly perhaps only the upper bound of the relative bias is affected, not the lower bound.

11 THE PREDICTION OF DISTURBANCES

Let us write the linear regression model (y, , σ2In) as

equation

We have seen how the unknown parameters β and σ2 can be optimally estimated by linear or quadratic functions of y. We now turn our attention to the ‘estimation’ of the disturbance vector ɛ. Since ɛ (unlike β) is a random vector, it cannot, strictly speaking, be estimated. Furthermore, ɛ (unlike y) is unobservable.

If we try to find an observable random vector, say e, which approximates the unobservable ɛ as closely as possible in some sense, it is appealing to minimize

(43)equation

subject to the constraints

(44)equation
(45)equation

This leads to the best linear unbiased predictor of ɛ,

(46)equation

which we recognize as the least‐squares residual vector (see Exercises 1 and 2).

A major drawback of the best linear unbiased predictor given in (46) is that its variance matrix is not scalar. In fact,

equation

whereas the variance matrix of ɛ, which e hopes to resemble, is σ2In. This drawback is especially serious if we wish to use e in testing the hypothesis var(ɛ) = σ2In.

For this reason we wish to find a predictor of ɛ (or more generally, ) which, in addition to being linear and unbiased, has a scalar variance matrix.

Exercises

  1. 1. Show that the minimization problem (43) subject to (44) and (45) amounts to
    equation
  2. 2. Solve this problem and show that images is the constrained minimizer.
  3. 3. Show that, while ɛ is unobservable, certain linear combinations of ɛ are observable. In fact, show that cɛ is observable if and only if Xc = 0, in which case cɛ = cy.

12 BEST LINEAR UNBIASED PREDICTORS WITH SCALAR VARIANCE MATRIX

Thus motivated, we propose the following definition of the predictor of , that is best linear unbiased with scalar variance matrix (BLUS).

13 BEST LINEAR UNBIASED PREDICTORS WITH FIXED VARIANCE MATRIX, I

We can generalize the BLUS approach in two directions. First, we may assume that the variance matrix of the linear unbiased predictor is not scalar, but some fixed known positive semidefinite matrix, say Ω. This is useful, because for many purposes the requirement that the variance matrix of the predictor is scalar is unnecessary; it is sufficient that the variance matrix does not depend on X.

Second, we may wish to generalize the criterion function to

equation

where Q is some given positive definite matrix.

14 BEST LINEAR UNBIASED PREDICTORS WITH FIXED VARIANCE MATRIX, II

Let us now present the full generalization of Theorem 14.8.

15 LOCAL SENSITIVITY OF THE POSTERIOR MEAN

Let y = + ɛ be the normal linear regression model with Eɛ = 0 and Eɛɛ′ = V, where V is positive definite. Suppose now that there is prior information concerning β:

equation

Then, as Leamer (1978, p. 76) shows, the posterior distribution of β is

equation

with

(70)equation

and

(71)equation

We are interested in the effects of small changes in the precision matrix V−1, the design matrix X, and the prior moments b* and H*−1 on the posterior mean b and the posterior precision H−1.

We first study the effects on the posterior mean.

Exercise

  1. 1. Show that the local sensitivity with respect to X of the least‐squares estimator b = (XX)−1Xy is given by
    equation

16 LOCAL SENSITIVITY OF THE POSTERIOR PRECISION

In precisely the same manner we can obtain the local sensitivity of the posterior precision.

BIBLIOGRAPHICAL NOTES

2–7. See Theil and Schweitzer (1961), Theil (1971), Rao (1971b), and also Neudecker (1980a).

8. See Balestra (1973), Neudecker (1980b, 1985b), Neudecker and Liu (1993), and Rolle (1994).

9–10. See Neudecker (1977b). Theorem 14.7 corrects an error in Neudecker (1978).

11–14. See Theil (1965), Koerts and Abrahamse (1969), Abrahamse and Koerts (1971), Dubbelman, Abrahamse, and Koerts (1972), and Neudecker (1973, 1977a).

15–16. See Leamer (1978) and Polasek (1986).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.108.9