In Chapter 13, we derived the ‘best’ affine unbiased estimator of β in the linear regression model (y, Xβ, σ2V) under various assumptions about the ranks of X and V. In this chapter, we discuss some other topics relating to the linear model.
Sections 14.2–14.7 are devoted to constructing the ‘best’ quadratic estimator of σ2. The multivariate analog is discussed in Section 14.8. The estimator
(1)
known as the least‐squares (LS) estimator of σ2, is the best quadratic unbiased estimator in the model (y, Xβ, σ2I). But if var(y) ≠ σ2In, then in (1) will, in general, be biased. Bounds for this bias which do not depend on X are obtained in Sections 14.9 and 14.10.
The statistical analysis of the disturbances ɛ = y − Xβ is taken up in Sections 14.11–14.14, where predictors that are best linear unbiased with scalar variance matrix (BLUS) and best linear unbiased with fixed variance matrix (BLUF) are derived.
Finally, we show how matrix differential calculus can be useful in sensitivity analysis. In particular, we study the sensitivities of the posterior moments of β in a Bayesian framework.
2 BEST QUADRATIC UNBIASED ESTIMATION OF σ2
Let (y, Xβ, σ2V) be the linear regression model. In the previous chapter, we considered the estimation of β as a linear function of the observation vector y. Since the variance σ2 is a quadratic concept, we now consider the estimation of σ2 as a quadratic function of y, that is, a function of the form
(2)
where A is nonstochastic and symmetric. Any estimator satisfying (2) is called a quadratic estimator.
If, in addition, the matrix A is positive (semi)definite and AV ≠ 0, and if y is a continuous random vector, then Pr(y′Ay > 0) = 1, and we say that the estimator is quadratic and positive (almost surely).
An unbiased estimator of σ2 is an estimator, say , such that
(3)
In (3), it is implicitly assumed that β and σ2 are not restricted (for example, by Rβ = r) apart from the requirement that σ2 is positive.
We now propose the following definition.
3 THE BEST QUADRATIC AND POSITIVE UNBIASED ESTIMATOR OF σ2
Our first result is the following well‐known theorem.
4 THE BEST QUADRATIC UNBIASED ESTIMATOR OF σ2
The estimator obtained in the preceding section is, in fact, the best in a wider class of estimators: the class of quadratic unbiased estimators. In other words, the constraint that be positive is not binding. We thus obtain the following generalization of Theorem 14.1.
5 BEST QUADRATIC INVARIANT ESTIMATION OF σ2
Unbiasedness, though a useful property for linear estimators in linear models, is somewhat suspect for nonlinear estimators. Another, perhaps more useful, criterion is invariance. In the context of the linear regression model
(21)
let us consider, instead of β, a translation β − β0. Then (21) is equivalent to
and we say that the estimator y′Ay is invariant under translation of β if
This is the case if and only if
(22)
We can obtain (22) in another, though closely related, way if we assume that the disturbance vector ɛ is normally distributed, ɛ ∼ N(0, σ2V), V positive definite. Then, by Theorem 12.12,
and
so that, under normality, the distribution of y′Ay is independent of β if and only if AX = 0.
If the estimator is biased, we replace the minimum variance criterion by the minimum mean squared error criterion. Thus, we obtain Definition 14.2.
6 THE BEST QUADRATIC AND POSITIVE INVARIANT ESTIMATOR OF σ2
A generalization of Theorem 14.2 is obtained by dropping the requirement that the quadratic estimator of σ2 be positive. In this wider class of estimators, we find that the estimator of Theorem 14.3 is again the best (smallest mean squared error), thus showing that the requirement of positiveness is not binding.
Comparing Theorems 14.2 and 14.4, we see that the best quadratic invariant estimator has a larger bias (it underestimates σ2) but a smaller variance than the best quadratic unbiased estimator, and altogether a smaller mean squared error.
8 BEST QUADRATIC UNBIASED ESTIMATION: MULTIVARIATE NORMAL CASE
is the best quadratic unbiased estimator of σ2, also known as the least‐squares (LS) estimator of σ2. If V ≠ In, then (42) is no longer an unbiased estimator of σ2, because, in general,
If both V and X are known, we can calculate the relative bias
exactly. Here, we are concerned with the case where V is known (at least in structure, say first‐order autocorrelation), while X is not known. Of course we cannot calculate the exact relative bias in this case. We can, however, find a lower and an upper bound for the relative bias of over all possible values of X.
10 BOUNDS FOR THE BIAS OF THE LEAST‐SQUARES ESTIMATOR OF σ2, II
Suppose now that X is not completely unknown. In particular, suppose that the regression contains a constant term, so that X contains a column of ones. Surely this additional information must lead to a tighter interval for the relative bias of . Theorem 14.7 shows that this is indeed the case. Somewhat surprisingly perhaps only the upper bound of the relative bias is affected, not the lower bound.
11 THE PREDICTION OF DISTURBANCES
Let us write the linear regression model (y, Xβ, σ2In) as
We have seen how the unknown parameters β and σ2 can be optimally estimated by linear or quadratic functions of y. We now turn our attention to the ‘estimation’ of the disturbance vector ɛ. Since ɛ (unlike β) is a random vector, it cannot, strictly speaking, be estimated. Furthermore, ɛ (unlike y) is unobservable.
If we try to find an observable random vector, say e, which approximates the unobservable ɛ as closely as possible in some sense, it is appealing to minimize
(43)
subject to the constraints
(44)
(45)
This leads to the best linear unbiased predictor of ɛ,
(46)
which we recognize as the least‐squares residual vector (see Exercises 1 and 2).
A major drawback of the best linear unbiased predictor given in (46) is that its variance matrix is not scalar. In fact,
whereas the variance matrix of ɛ, which e hopes to resemble, is σ2In. This drawback is especially serious if we wish to use e in testing the hypothesis var(ɛ) = σ2In.
For this reason we wish to find a predictor of ɛ (or more generally, Sɛ) which, in addition to being linear and unbiased, has a scalar variance matrix.
Exercises
1. Show that the minimization problem (43) subject to (44) and (45) amounts to
2.
Solve this problem and show that is the constrained minimizer.
3.
Show that, while ɛ is unobservable, certain linear combinations of ɛ are observable. In fact, show that c′ɛ is observable if and only if X′c = 0, in which case c′ɛ = c′y.
12 BEST LINEAR UNBIASED PREDICTORS WITH SCALAR VARIANCE MATRIX
Thus motivated, we propose the following definition of the predictor of Sɛ, that is best linear unbiased with scalar variance matrix (BLUS).
13 BEST LINEAR UNBIASED PREDICTORS WITH FIXED VARIANCE MATRIX, I
We can generalize the BLUS approach in two directions. First, we may assume that the variance matrix of the linear unbiased predictor is not scalar, but some fixed known positive semidefinite matrix, say Ω. This is useful, because for many purposes the requirement that the variance matrix of the predictor is scalar is unnecessary; it is sufficient that the variance matrix does not depend on X.
Second, we may wish to generalize the criterion function to
where Q is some given positive definite matrix.
14 BEST LINEAR UNBIASED PREDICTORS WITH FIXED VARIANCE MATRIX, II
Let us now present the full generalization of Theorem 14.8.
15 LOCAL SENSITIVITY OF THE POSTERIOR MEAN
Let y = Xβ + ɛ be the normal linear regression model with Eɛ = 0 and Eɛɛ′ = V, where V is positive definite. Suppose now that there is prior information concerning β:
Then, as Leamer (1978, p. 76) shows, the posterior distribution of β is
with
(70)
and
(71)
We are interested in the effects of small changes in the precision matrix V−1, the design matrix X, and the prior moments b* and H*−1 on the posterior mean b and the posterior precision H−1.
We first study the effects on the posterior mean.
Exercise
1. Show that the local sensitivity with respect to X of the least‐squares estimator b = (X′X)−1X′y is given by
16 LOCAL SENSITIVITY OF THE POSTERIOR PRECISION
In precisely the same manner we can obtain the local sensitivity of the posterior precision.
BIBLIOGRAPHICAL NOTES
2–7. See Theil and Schweitzer (1961), Theil (1971), Rao (1971b), and also Neudecker (1980a).
8. See Balestra (1973), Neudecker (1980b, 1985b), Neudecker and Liu (1993), and Rolle (1994).
11–14. See Theil (1965), Koerts and Abrahamse (1969), Abrahamse and Koerts (1971), Dubbelman, Abrahamse, and Koerts (1972), and Neudecker (1973, 1977a).