Estimating ARCH Models by Least Squares
The simplest estimation method for ARCH models is that of ordinary least squares (OLS). This estimation procedure has the advantage of being numerically simple, but has two drawbacks: (i) the OLS estimator is not efficient and is outperformed by methods based on the likelihood or on the quasi-likelihood that will be presented in the next chapters; (ii) in order to provide asymptotically normal estimators, the method requires moments of order 8 for the observed process. An extension of the OLS method, the feasible generalized least squares (FGLS) method, suppresses the first drawback and attenuates the second by providing estimators that are asymptotically as accurate as the quasi-maximum likelihood under the assumption that moments of order 4 exist. Note that the least-squares methods are of interest in practice because they provide initial estimators for the optimization procedure that is used in the quasi-maximum likelihood method.
We begin with the unconstrained OLS and FGLS estimators. Then, in Section 6.3, we will see how to take into account positivity constraints on the parameters.
6.1 Estimation of ARCH(q) models by Ordinary Least Squares
In this section, we consider the OLS estimator of the ARCH(q) model:
The OLS method uses the AR representation on the squares of the observed process. No assumption is made on the law of ηt.
The true value of the vector of the parameters is denoted by θ0 = (ω0, α01, …, α0q)′ and we denote by θ a generic value of the parameter.
From (6.1) we obtain the AR(q) representation
where ut = − = ( − 1). The sequence (ut, t)t constitutes a martingale difference when E = < ∞, denoting by t the σ-field generated by {s : s ≤ t}.
Assume that we observe 1, …, n, a realization of length n of the process (t), and let 0, …, 1 − q be initial values. For instance, the initial values can be chosen equal to zero. Introducing the vector
in view of (6.2) we obtain the system
which can be written as
with the n × q matrix
and the n × 1 vectors
Assume that the matrix X′X is invertible, or equivalently that X has full column rank (we will see that this is always the case asymptotically, and thus for n large enough). The OLS estimator of ¸0 follows:
Under assumptions OLS1 and OLS2 below the variance of ut, exists and is constant. The OLS estimator of = Var(ut) is
Remark 6.1 (OLS estimator of a GARCH model) An OLS estimator can also be defined for a GARCH (p, q) model, but the estimator is not explicit, because does not satisfy an AR model when p ≠ 0 (see Exercise 7.5).
To establish the consistency of the OLS estimators of θ0 and , we must consider the following assumptions.
OLS1: (t) is the nonanticipative strictly stationary solution of model (6.1), and ω0 > 0.
OLS2: E < +∞
OLS3: ( = 1) ≠ 1.
Explicit conditions for assumptions OLS1 and OLS2 were given in Chapter 2. Assumption OLS3 that the law of ηt is nondegenerate allows us to identify the parameters. The assumption also guarantees the invertibility of X′X for n large enough.
Theorem 6.1 (Consistency of the OLS estimator of an ARCH model) Let (n) be a sequence of estimators satisfying (6.4). Under assumptions OLS1–OLS3, almost surely
Proof. The proof consists of several steps.
(i) We have seen (Theorem 2.4) that (t), the unique nonanticipative stationary solution of the model, is ergodic. The process (Zt) is also ergodic because Zt is a measurable function of {t − t, i ≥ 0}. The ergodic theorem (see Theorem A.2) then entails that
The existence of the expectation is guaranteed by assumption OLS3. Note that the initial values are involved only in a fixed number of terms of the sum, and thus they do not matter for the asymptotic result. Similarly, we have
(ii) The invertibility of the matrix EZt − 1 Z′t − 1 = EZtZ′t is shown by contradiction. Assume that there exists a nonzero vector c of q+1 such that c′EZt Z′tc = 0. Thus E{c′Zt}2 = 0, and it follows that c′Zt = 0 a.s. Therefore, there exists a linear combination of the variables which is a.s. equal to a constant. Without loss of generality, one can assume that, in this linear combination, the coefficient of = is 1. Thus is a.s. a measurable function of the variables t − 1, …, t−q. However, the solution being nonanticipative, is independent of these variables. This implies that is a.s. equal to a constant. This constant is necessarily equal to 1, but this leads to a contradiction with OLS3. Thus E(Zt − 1 Z′t − 1) is invertible.
(iii) The innovation of being , we have the orthogonality relations
that is
(iv) Point (ii) shows that n−1 X′X is a.s. invertible, for n large enough and that, almost surely, as n → ∞,
For the asymptotic normality of the OLS estimator, we need the following additional assumption.
OLS4: .
Consider the (q + 1) × (q + 1) matrices
The invertibility of A was established in the proof of Theorem 6.1, and the invertibility of B is shown by the same argument, noting that if and only if c′Zt − 1 = 0 because > 0 a.s. The following result establishes the asymptotic normality of the OLS estimator.
Let
Theorem 6.2 (Asymptotic normality of the OLS estimator) Under assumptions OLS1–OLS4,
Proof. In view of (6.3), we have
Thus
Let λ q + 1, λ ≠ 0. The sequence (λ′Zt − 1ut, t) is a square integrable ergodic stationary martingale difference, with variance
By the CLT (see Corollary A.1) we obtain that, for all λ ≠ 0,
Using the Cramér-Wold device, it follows that
The conclusion follows from (6.5), (6.6) and (6.7).
Remark 6.2 (Estimation of the Information matrices) Consistent estimators and of the matrices A and B are obtained by replacing the theoretical moments by their empirical counterparts,
where . The fourth order moment of the process ηt = t/σt is also consistently estimated by . Finally, a consistent estimator of the asymptotic variance of the OLS estimator is defined by
Example 6.1 (ARCH(1)) When q = 1 the moment conditions OLS2 and OLS4 take the form and (see (2.54)). We have
with
The other terms of the matrix B are obtained by expanding and calculating the moments of order 6 and 8 of .
Table 6.1 shows, for different laws of the iid process, that the moment conditions OLS2 and OLS4 impose strong constraints on the parameter space.
Table 6.2 displays numerical values of the asymptotic variance, for different values of α01 and ω0 = 1, when ηt follows the normal N(0, 1).
The asymptotic accuracy of n becomes very low near the boundary of the domain of existence of . The OLS method can, however, be used for higher values of α01, because the estimator remains consistent when α01 < 3−1/2 = 0.577, and thus can provide initial values for an algorithm maximizing the likelihood.
‘no’ means that the moment condition is not satisfied.
6.2 Estimation of ARCH(q) Models by Feasible Generalized Least Squares
In a linear regression model when, conditionally on the exogenous variables, the errors are heteroscedastic, the FGLS estimator is asymptotically more accurate than the OLS estimator. Note that in (6.3) the errors ut are, conditionally on Zt−1, heteroscedastic with conditional variance Var(ut | Zt−1) = (κη−1).
For all θ = (ω, α1, …, αq)′ let
The FGLS estimator is defined by
Theorem 63 (Asymptotic properties of the FGLS estimator) Under assumptions OLS1 – OLS3 and if α0i > 0, i = 1, …, q,
where is positive definite.
Proof. It can be shown that J is positive definite by the argument used in Theorem 6.1.
We have
A Taylor expansion around θ0 yields, with = (θ0),
where θ* is between n and θ0. Note that, for all θ, . It follows that
The first term on the right-hand side of the equality converges a.s. to J by the ergodic theorem. The second term converges a.s. to 0 because the OLS estimator is consistent and
for n large enough. The constant bound K is obtained by arguing that the components of n, and thus those of θ*, are strictly positive for n large enough (because n → θ0 a.s.). Thus, we have , for i = 1, …, q, and finally is bounded. We have shown that a.s.
For the term in braces in (6.8) we have
by the previous arguments, noting that and
Thus, we have shown that , a.s.
Using (6.11), (6.8) and (6.10), we have
where Rn → 0, a.s. A new expansion around θ0 gives
where θ** is between θ* and θ0. It follows that
The CLT applied to the ergodie and square integrable stationary martingale difference shows that Sn converges in distribution to a Gaussian vector with zero mean and variance
(see Corollary A.l). Moreover,
The two terms in braces tend to 0 a.s. by the ergodic theorem. Moreover, the terms and are bounded in probability, as well as J− 1 + Rn. It follows that Sn2 tends to 0 in probability. Finally, by arguments already used and because θ* is between n and θ0
in probability. Using (6.12), we have shown the convergence in law of the theorem.
The moment condition required for the asymptotic normality of the FGLS estimator is E() < ∞. For the OLS estimator we had the more restrictive condition . Moreover, when this eighth-order moment exists, the following result shows that the OLS estimator is asymptotically less accurate than the FGLS estimator.
Theorem 6.4 (Asymptotic OLS versus FGLS variances) Under assumptions 0LS1–0LS4, the matrix
is positive semi-definite.
is positive semi-definite, and the result follows.
We will see In Chapter 7 that the asymptotic variance of the FGLS estimator coincides with that of the quasi-maximum likelihood estimator (but the asymptotic normality of the latter is obtained without moment conditions). This result explains why quasi-maximum likelihood is preferred to OLS (and even to FGLS) for the estimation of ARCH (and GARCH) models. Note, however, that the OLS estimator often provides a good initial value for the optimization algorithm required for the quasi-maximum likelihood method.
6.3 Estimation by Constrained Ordinary Least Squares
Negative components are not precluded in the OLS estimator n defined by (6.4) (see Exercise 6.3). When the estimate has negative components, predictions of the volatility can be negative. In order to avoid this problem, we consider the constrained OLS estimator defined by
The existence of is guaranteed by the continuity of the function Qn and the fact that
as θ → ∞ and θ ≥ 0, whenever X has nonzero columns. Note that the latter condition is satisfied at least for n large enough (see Exercise 6.5).
6.3.1 Properties of the Constrained OLS Estimator
The following theorem gives a condition for equality between the constrained and unconstrained estimators. The theorem is stated in the ARCH case but is true In a much more general framework.
Theorem 6.5 (Equality between constrained and unconstrained OLS) If X is of rank q + 1, the constrained and unconstrained estimators coincide, , if and only if .
Proof. Since n and are obtained by minimizing the same function Qn(·), and since minimizes this function on a smaller set, we have Moreover, , and we have , for all θ [0, + ∞)q + 1.
Suppose that the unconstrained estimation n belongs to [0, + ∞)q + 1. In this case Qn(n)= Qn() Because the unconstrained solution is unique, = n.
The converse Is trivial.
We now give a way to obtain the constrained estimator from the unconstrained estimator.
Theorem 6.6 (Constrained OLS as a projection of OLS) If X has rank q + 1, the constrained estimator is the orthogonal projection of n on [0, + ∞)q + 1 with respect to the metric X′X, that is,
Proof. If we denote by P the orthogonal projector on the columns of X, and M = In − P,
we have
using properties of projections, Pythagoras’s theorem and PY = Xn. The constrained estimation thus solves (6.14). Note that, since X has full column rank, a norm is well defined by . The characterization (6.14) is equivalent to
Since [0, +∞)q+1 is convex, exists, is unique and Is the X′X-orthogonal projection of n on [0, + ∞)q+1. This projection is characterized by
(see Exercise 6.9). This characterization shows that, when , the constrained estimation must lie at the boundary of [0, + ∞)q+1. Otherwise it suffices to take θ [0, + ∞)q+1 between and n to obtain a scalar product equal to − 1.
The characterization (6.15) allows us to easily obtain the strong consistency of the constrained estimator.
Theorem 6.7 (Consistency of the constrained OLS estimator) Under the assumptions of Theorem 6.1, almost surely,
Proof. Since θ0 [0, +∞)q+1, in view of (6.15) we have
It follows that, using the triangle inequality,
Since, in view of Theorem 6.1, n → θ0 a.s. and X′X/n converges a.s. to a positive definite matrix, It follows that and thus that a.s. Using Exercise 6.12, the conclusion follows.
6.3.2 Computation of the Constrained OLS Estimator
We now give an explicit way to obtain the constrained estimator. We have already seen that if all the components of the unconstrained estimator θn are positive, we have = n. Now suppose that one component of n is negative, for instance the last one. Let
and
Note that in general (see Exercise 6.11).
Theorem 6.8 (Explicit form of the constrained estimator) Assume that X has rank q + 1 and q < 0. Then
Proof. Let be the projector on the columns of X(1) and let M(1) = I − P. We have
Because , with eq+1 = (0, …, 0, 1)′, we have . This can be written as
or alternatively
Thus Y′M(1)X(2) < 0. It follows that for all θ = (θ(1)′, θ(2))′ such that θ(2) [0, ∞),
In view of (6.16), we have = n because n [0, +∞)q+1.
The OLS method was proposed by Engle (1982) for ARCH models. The asymptotic properties of the OLS estimator were established by Weiss (1984, 1986), in the ARMA-GARCH framework, under eighth-order moments assumptions. Pantula (1989) also studied the asymptotic properties of the OLS method in the AR(l)-ARCH(q) case, and he gave an explicit form for the asymptotic variance. The FGLS method was developed, in the ARCH case, by Bose and Mukherjee (2003) (see also Gouriéroux, 1997). The convexity results used for the study of the constrained estimator can be found, for instance, in Moulin and Fogelman-Soulié (1979).
6.1 (Estimating the ARCH(q) for q = 1, 2, …)
Describe how to use the Durbin algorithm (B.7)–(B.9) to estimate an ARCH(q) model by OLS.
6.2 (Explicit expression for the OLS estimator of an ARCH process)
With the notation of Section 6.1, show that, when X has rank q, the estimator = (X′X)−1 X′Y is the unique solution of the minimization problem
6.3 (OLS estimator with negative values)
Give a numerical example (with, for instance, n = 2) showing that the unconstrained OLS estimator of the ARCH(q) parameters (with, for instance, q = 1) can take negative values.
6.4 (Unconstrained and constrained OLS estimator of an ARCH(2) process)
Consider the ARCH(2) model
Let be the unconstrained OLS estimator of θ = (ω, α1, α2)′. Is it possible to have
1. 1 < 0?
2. 1 < 0 and 2 < 0?
3. < 0, 1 < 0 and 2 < 0?
Let be the OLS constrained estimator with and . Consider the following numerical example with n = 3 observations and two initial values: , . Compute and C for these observations.
6.5 (The columns of the matrix X are nonzero)
Show that if ω0 > 0, the matrix X cannot have a column equal to zero for n large enough.
6.6 (Estimating an AR(l) with ARCH(q) errors)
Consider the model
where (t) is the strictly stationary solution of model (6.1) under the condition . Show that the OLS estimator of φ is consistent and asymptotically normal. Is the assumption necessary in the case of iid errors?
6.7 (Inversion of a block matrix)
For a matrix partitioned as , show that the inverse (when it exists) is of the form
where
6.8 (Does the OLS asymptotic variance depend on ω0?)
1. Show that for an ARCH(q) model is proportional to (when it exists).
2. Using Exercise 6.7, show that, for an ARCE(q) model, the asymptotic variance of the OLS estimator of the α0i does not depend on ω0.
3. Show that the asymptotic variance of the OLS estimator of ω0 is proportional to .
6.9 (Properties of the projections on closed convex sets)
Let E be an Hilbert space, with a scalar product ·, · and a norm · . When C E and x E,it is said that x* C is a best approximation of x on C if x − x* = minyc x − y .
1. Show that if C is closed and convex, x* exists and is unique. This point is then called the projection of x on C.
2. Show that x* satisfies the so-called variational inequalities:
and prove that x* is the unique point of C satisfying these inequalities.
6.10 (Properties of the projections on closed convex cones)
Recall that a subset K of the vectorial space E is a cone if, for all x K, and for all λ ≥ 0, we have λx K. Let K be a closed convex cone of the Hilbert space E.
1. Show that the projection x* of x on K (see Exercise 6.9) is characterized by
2. Show that x* satisfies
(a) x E, λ ≥ 0, (λx)* = λx*.
(b) x E, x2 = x*2 + x − x*2, thus x* ≤ x.
6.11 (OLS estimation of a subvector of parameters)
Consider the linear model Y = Xθ + U with the usual assumptions. Let M2 be the matrix of the orthogonal projection on the orthogonal subspace of X(2), where X = (X(1), X(2)). Show that the OLS estimator of θ(1) (where θ = (θ(l)′, θ(2)′)′, with obvious notation) is
6.12 (A matrix result used in the proof of Theorem 6.7)
Let (Jn) be a sequence of symmetric k × k matrices converging to a positive definite matrix J. Let (Xn) be a sequence of vectors in k such that X′nJnXn → 0. Show that Xn → 0.
6.13 (Example of constrained estimator calculus)
Take the example of Exercise 6.3 and compute the constrained estimator.