Estimating GARCH Models by Quasi-Maximum Likelihood
The quasi-maximum likelihood (QML) method is particularly relevant for GARCH models because it provides consistent and asymptotically normal estimators for strictly stationary GARCH processes under mild regularity conditions, but with no moment assumptions on the observed process. By contrast, the least-squares methods of the previous chapter require moments of order 4 at least.
In this chapter, we study in details the conditional QML method (conditional on initial values). We first consider the case when the observed process is pure GARCH. We present an iterative procedure for computing the Gaussian log-likelihood, conditionally on fixed or random initial values. The likelihood is written as if the law of the variables ηt were Gaussian (0, 1) (we refer to pseudo- or quasi-likelihood), but this assumption is not necessary for the strong consistency of the estimator. In the second part of the chapter, we will study the application of the method to the estimation of ARMA-GARCH models. The asymptotic properties of the quasi-maximum likelihood estimator (QMLE) are established at the end of the chapter.
7.1 Conditional Quasi-Likelihood
Assume that the observations 1, …, n constitute a realization (of length n) of a GARCH(p, q) process, more precisely a nonanticipative strictly stationary solution of
where (ηt) is a sequence of iid variables of variance 1, ω0 > 0, α0i ≥ 0 (i = 1, …, q), and β0j ≥ 0 (j = 1, …, p). The orders p and q are assumed known. The vector of the parameters
belongs to a parameter space of the form
The true value of the parameter is unknown, and is denoted by
To write the likelihood of the model, a distribution must be specified for the iid variables ηt. Here we do not make any assumption on the distribution of these variables, but we work with a function, called the (Gaussian) quasi-likelihood, which, conditionally on some initial values, coincides with the likelihood when the ηt are distributed as standard Gaussian. Given initial values 0, …, 1 − q, to be specified below, the conditional Gaussian quasi-likelihood is given by
where the are recursively defined, for t ≥ 1, by
For a given value of θ, under the second-order stationarity assumption, the unconditional variance (corresponding to this value of θ) is a reasonable choice for the unknown initial values:
Such initial values are, however, not suitable for IGARCH models, in particular, and more generally when the second-order stationarity is not imposed. Indeed, the constant (7.5) would then take negative values for some values of θ. In such a case, suitable initial values are
or
A QMLE of θ is defined as any measurable solution n of
Taking the logarithm, it is seen that maximizing the likelihood is equivalent to minimizing, with respect to θ,
and is defined by (7.4). A QMLE is thus a measurable solution of the equation
It will be shown that the choice of the initial values is unimportant for the asymptotic properties of the QMLE. However, in practice this choice may be important. Note that other methods are possible for generating the sequence ; for example, by taking where the ci(θ) are recursively computed (see Berkes, Horváth and Kokoszka, 2003b). Note that for computing , this procedure involves a number of operations of order n2, whereas the one we propose involves a number of order n. It will be convenient to approximate the sequence ((θ)) by an ergodic stationary sequence. Assuming that the roots of θ(z) are outside the unit disk, the nonanticipative and ergodic strictly stationary sequence ()t = { (θ)}t is defined as the solution of
Note that (θ0) = ht.
Likelihood Equations
Likelihood equations are obtained by canceling the derivative of the criterion with respect to θ, which gives
These equations can be interpreted as orthogonality relations, for large n. Indeed, as will be seen in the next section, the left-hand side of equation (7.11) has the same asymptotic behavior as
the impact of the initial values vanishing as n → ∞.
The innovation of is . Thus, under the assumption that the expectation exists, we have
because is a measurable function of the t − i, i > 0. This result can be viewed as the asymptotic version of (7.11) at θ0, using the ergodic theorem.
7.1.1 Asymptotic Properties of the QMLE
In this chapter, we will use the matrix norm defined by A = Σ | aij | for all matrices A = (aij). The spectral radius of a square matrix A is denoted by ρ(A).
Recall that model (7.1) admits a strictly stationary solution if and only if the sequence of matrices A0 = (A0t), where
admits a strictly negative top Lyapunov exponent, γ (A0) < 0, where
Let
By convention, θ (z) = 0 if q = 0 and θ (z) = 1 if p = 0. To show strong consistency, the following assumptions are used.
A1: θ0 Θ and Θ is compact.
A2: γ(A0) < 0 and for all θ Θ, .
A3: has a nondegenerate distribution and E = 1.
A4: If p > 0, (z) and (z) have no common roots, (1) ≠ 0, and α0q + β0p ≠ 0.
Note that, by Corollary 2.2, the second part of assumption A2 implies that the roots of θ(z) are outside the unit disk. Thus, a nonanticipatlve and ergodic strictly stationary sequence ()t is defined by (7.10). Similarly, define
Example 7.1 (Parameter space of a GARCH(1, 1) process) In the case of a GARCH(1, 1) process, assumptions Al and A2 hold true when, for instance, the parameter space is of the form
where δ (0, 1) Is a constant, small enough so that the true value θ0 = (ω0, α0, β0)′ belongs to Θ. Figure 7.1 displays, in the plane (α, β), the zones of strict stationarity (when ηt is (0, 1)
distributed) and of second-order stationarity, as well as an example of a parameter space Θ (the gray zone) compatible with assumptions A1 and A2.
The first result states the strong consistency of n. The proof of this theorem, and of the next ones, is given in Section 7.4.
Theorem 7.1 (Strong consistency of the QMLE) Let (n) be a sequence of QMLEs satisfying (7.9), with initial conditions (7.6) or (7.7). Under assumptions A1–A4, almost surely
Remark 7.1
1. It Is not assumed that the true value of the parameter θ0 belongs to the interior of Θ. Thus, the theorem allows to handle cases where some coefficients, αi or βj, are null.
2. It Is important to note that the strict stationarlty condition Is only assumed at θ0, not over all Θ. In view of Corollary 2.2, the condition is weaker than the strict stationarlty condition.
3. Assumption A4 disappears In the ARCH case. In the general case, this assumption allows for an overidentification of either of the two orders, p or q, but not of both. We then consistently estimate the parameters of a GARCH(p − 1, q) (or GARCH(p, q − 1)) process if an overparameterized GARCH(p, q) model is used.
4. When p ≠ 0, assumption A4 precludes the case where all the α0i are zero. In such a case, the strictly stationary solution of the model is the strong white noise, which can be written in multiple forms. For instance, a strong white noise of variance 1 can be written in the GARCH(1, 1) form with
5. The assumption of absence of a common root, in A4, is restrictive only if p > 1 and q > 1. Indeed if q = 1, the unique root of (z) is 0 and we have (0) ≠ 0. If p = 1 and β01 ≠ 0, the unique root of (z) is 1/β01 > 0 (if β01 = 0, the polynomial does not admit any root). Because the coefficients α0i are positive this value cannot be a zero of (z).
6. The assumption Eηt = 0 Is not required for the consistency (and asymptotic normality) of the QMLE of a GARCH. The conditional variance of t is thus, in general, only proportional to ht: Var(t | u, u < t) = {1 − (Eηt)2}ht. The assumption E is made for identifiability reasons and is not restrictive provided that E .
The following additional assumptions are considered.
A5: θ0 , where denotes the interior of Θ.
A6: κη = E < ∞.
The limiting distribution of n is given by the following result.
Theorem 7.2 (Asymptotic normality of the QMLE) Under assumptions A1–A6,
where
is a positive definite matrix.
Remark 7.2
1. Assumption A5 is standard and entails the first-order condition (at least asymptotically). Indeed if n is consistent, it also belongs to the interior of Θ, for large n. At this maximum the derivative of the objective function cancels. However, assumption A5 is restrictive because it precludes, for instance, the case α01 = 0.
2. When one or several components of θ0 are null, assumption A5 is not satisfied and the theorem cannot be used. It is clear that, in this case, the asymptotic distribution of (n − θ0) cannot be normal because the estimator is constrained. If, for instance, α01 = 0, the distribution of (1 − α 01) is concentrated in [0, ∞), for all n, and thus cannot be asymptotically normal. This kind of ‘boundary’ problem is the object of a specific study in Chapter 8.
3. Assumption A6 does not concern , and does not preclude the IGARCH case. Only a fourth-order moment assumption on ηt is required. This assumption is clearly necessary for the existence of the variance of the score vector ∂t(θ0)/∂θ. In the proof of this theorem, it is shown that
4. In the ARCH case (p = 0), the asymptotic variance of the QMLE reduces to that of the FGLS estimator (see Theorem 6.3). Indeed, in this case we have . Theorem 6.3 requires, however, the existence of a fourth-order moment for the observed process, whereas there is no moment assumption for the asymptotic normality of the QMLE. Moreover, Theorem 6.4 shows that the QMLE of an ARCH(q) is asymptotically more accurate than that of the OLS estimator.
7.1.2 The ARCH(l) Case: Numerical Evaluation of the Asymptotic Variance
Consider the ARCH(l) model
with ω0 > 0 and α0 > 0, and suppose that the variables ηt satisfy assumption A3. The parameter is θ = (ω, α)′. In view of (2.10), the strict stationarity constraint A2 is written as
Assumption Al holds true if, for instance, the parameter space is of the form Θ = [δ, 1/δ] × [0, 1/δ], where δ > 0 is a constant, chosen sufficiently small so that θ0 = (ω0, ρ0)′ belongs to Θ. By Theorem 7.1, the QMLE of θ is then strongly consistent. Since , the QMLE is characterized by the normal equation
with, for instance, . This estimator does not have an explicit form and must be obtained numerically. Theorem 7.2, which provides the asymptotic distribution of the estimator, only requires the extra assumption that θ0 belongs to = (δ, 1/δ) × (0, 1/δ). Thus, if α0 = 0 (that is, if the model is conditionally homoscedastic), the estimator remains consistent but is no longer asymptotically normal. Matrix J takes the form
and the asymptotic variance of (n − θ0) is
Table 7.1 displays numerical evaluations of this matrix. An estimation of J is obtained by replacing the expectations by empirical means, obtained from simulations of length 10 000, when ηt is (0, 1) distributed. This experiment is repeated 1000 times to obtain the results presented in the table.
In order to assess, in finite samples, the quality of the asymptotic approximation of the variance of the estimator, the following Monte Carlo experiment is conducted. For the value θ0 of the parameter, and for a given length n, N samples are simulated, leading to N estimations of
θ0, i = 1, … N. We denote by their empirical mean. The root mean squared error (RMSE) of estimation of α is denoted by
and can be compared to , the latter quantity being evaluated independently, by simulation. A similar comparison can obviously be made for the parameter ω. For θ0 = (0.2, 0.9)′ and N = 1000, Table 7.2 displays the results, for different sample length n.
The similarity between columns 3 and 4 is quite satisfactory, even for moderate sample sizes. The last column gives the empirical probability (that is, the relative frequency within the N samples) that n is greater than 1 (which is the limiting value for second-order stationarity). These results show that, even if the mean of the estimations is close to the true value for large n, the variability of the estimator remains high. Finally, note that the length n = 1000 remains realistic for financial series.
7.1.3 The Nonstationary ARCH(l)
When the strict stationarity constraint is not satisfied in the ARCH(l) case, that is, when
one can define an ARCH(l) process starting with initial values. For a given value 0, we define
where ω0 > 0 and α0 > 0, with the usual assumptions on the sequence (ηt). As already noted, converges to infinity almost surely when
and only in probability when the inequality (7.14) is an equality (see Corollary 2.1 and Remark 2.3 following it). Is it possible to estimate the coefficients of such a model? The answer is only partly positive: it is possible to consistently estimate the coefficient α0, but the coefficient ω0 cannot be consistently estimated. The practical impact of this result thus appears to be limited, but because of its theoretical interest, the problem of estimating coefficients of nonstationary models deserves attention. Consider the QMLE of an ARCH(l), that is to say a measurable solution of
where θ = (ω, α), Θ is a compact set of (0, ∞)2, and for t = 1, …, n (starting with a given initial value for ). The almost sure convergence of to infinity will be used to show the strong consistency of the QMLE of α0. The following lemma completes Corollary 2.1 and gives the rate of convergence of to infinity under (7.16).
Lemma 7.1 Define the ARCH(l) model by (7.15) with any initial condition ≥ 0. The nonstationarity condition (7.16) is assumed. Then, almost surely, as n → ∞,
for any constant ρ such that
This result entails the strong consistency and asymptotic normality of the QMLE of α0.
Theorem 7.3 Consider the assumptions of Lemma 7.1 and the QMLE defined by (7.17) where θ0 = (ω0, α0) Θ. Then
and when θ0 belongs to the interior of Θ,
as n → ∞.
In the proof of this theorem, it is shown that the score vector satisfies
In the standard statistical inference framework, the variance J of the score vector is (proportional to) the Fisher information. According to the usual interpretation, the form of the matrix J shows that, asymptotically and for almost all observations, the variations of the log-likelihood log t(θ) are insignificant when θ varies from (ω0, α0) to (ω0 + h, α0) for small h. In other words, the limiting log-likelihood is flat at the point (ω0, α0) in the direction of variation of ω0. Thus, minimizing this limiting function does not allow θ0 to be found. This leads us to think that the QML of ω0 is likely to be inconsistent when the strict stationarity condition is not satisfied. Figure 7.2 displays numerical results illustrating the performance of the QMLE in finite samples. For different values of the parameters, 100 replications of the ARCH(1) model have been generated, for the sample sizes n = 200 and n = 4000. The top panels of the figure correspond to a second-order stationary ARCH(1), with parameter θ0 = (1, 0.95). The panels in the middle correspond to a strictly stationary ARCH(1) of infinite variance, with θ0 = (1, 1.5). The results obtained for these two cases are similar, confirming that second-order stationarity is not necessary for estimating an ARCH. The bottom panels, corresponding to the explosive ARCH(1) with parameter θ0 = (1, 4), confirm the asymptotic results concerning the estimation of α0. They also illustrate the failure of the QML to estimate ω0 under the nonstationarity assumption (7.16). The results even deteriorate when the sample size increases.
7.2 Estimation of ARMA-GARCH Models by Quasi-Maximum Likelihood
In this section, the previous results are extended to cover the situation where the GARCH process is not directly observed, but constitutes the innovation of an observed ARMA process. This framework is relevant because, even for financial series, it is restrictive to assume that the observed series is the realization of a noise. From a theoretical point of view, it will be seen that the extension to the ARMA-GARCH case is far from trivial. Assume that the observations X1, …, Xn are generated by a strictly stationary nonanticipative solution of the ARMA(P, Q)-GARCH(p, q) model
where (ηt) and the coefficients ω0, α0i and β0j are defined as in (7.1). The orders P, Q, p, q are assumed known. The vector of the parameters is denoted by
where θ is defined as previously (see (7.2)). The parameter space is
The true value of the parameter is denoted by
We still employ a Gaussian quasi-likelihood conditional on initial values. If q ≥ Q, the initial values are
These values (the last p of which are positive) may depend on the parameter and/or on the observations. For any , the values of t(), for t = −q + Q + 1, …, n, and then, for any θ, the values of (), for t = 1, …, n, can thus be computed from
When q < Q, the fixed initial values are
Conditionally on these initial values, the Gaussian log-likelihood is given by
A QMLE is defined as a measurable solution of the equation
Strong Consistency
Let and . Standard assumptions are made on these AR and MA polynomials, and assumption A1 is modified as follows:
A7: 0 Ф and Ф is compact.
A8: For all Ф, a(z)b(z) = 0 implies |z| > 1.
A9: (z) and (z) have no common roots, a0P ≠ 0 or b0Q ≠ 0.
Under assumptions A2 and A8, (Xt) is supposed to be the unique strictly stationary nonanticipative solution of (7.21). Let and , where Is the nonantlcipative and ergodic strictly stationary solution of (7.10). Note that et = t (0) and . The following result extends Theorem 7.1.
Theorem 7.4 (Consistency of the QMLE) Let be a sequence of QMLEs satisfying (7.2). Assume that Eηt = 0. Then, under assumptions A2–A4 and A7–A9, almost surely
Remark 7.3
1. As in the pure GARCH case, the theorem does not impose a finite variance for et (and thus for Xt). In the pure ARMA case, where et = ηt admits a finite variance, this theorem reduces to a standard result concerning ARMA models with iid errors (see Brockwell and Davis, 1991, p. 384).
2. Apart from the condition Eηt =0, the conditions required for the strong consistency of the QMLE are not stronger than in the pure GARCH case.
Asymptotic Normality When the Moment of Order 4 Exists
So far, the asymptotic results of the QMLE (consistency and asymptotic normality in the pure GARCH case, consistency in the ARMA-GARCH case) have not required any moment assumption on the observed process (for the asymptotic normality in the pure GARCH case, a moment of order 4 is assumed for the iid process, not for t). One might think that this will be the same for establishing the asymptotic normality in the ARMA-GARCH case. The following example shows that this is not the case.
Example 7.2 (Nonexistence of J without moment assumption) Consider the AR(1)-ARCH(1) model
where |α01| < 1, ω0 > 0, α0 ≥ 0, and the distribution of the iid sequence (ηt) is defined, for a > 1, by
Then the process (Xt) is always stationary, for any value of α0 (because exp see the strict stationarity constraint (2.10)). By contrast, Xt does not admit a moment of order 2 when α0 ≥ 1 (see Theorem 2.2). The first component of the (normalized) score vector is
since, first, ηt−1 = 0 entails t−1 = 0 and Xt−1 = a01 Xt−2, and second, ηt−1 and Xt−2 are independent. Consequently, if E and a01 ≠ 0, the score vector does not admit a variance.
This example shows that it is not possible to extend the result of asymptotic normality obtained in the GARCH case to the ARMA-GARCH models without additional moment assumptions. This is not surprising because for ARMA models (which can be viewed as limits of ARMA-GARCH models when the coefficients α0i and β0j tend to 0) the asymptotic normality of the QMLE is shown with second-order moment assumptions. For an ARMA with infinite variance innovations, the consistency of the estimators may be faster than in the standard case and the asymptotic distribution is stable, but non-Gaussian in general. We show the asymptotic normality with a moment assumption of order 4. Recall that, by Theorem 2.9, this assumption is equivalent to ρ {E(A0t A0t)} < 1. We make the following assumptions:
A10: ρ {E(A0t A0t)} < 1 and, for all
A11: 0 , where denotes the interior of Ф.
A12: There exists no set Λ of cardinality 2 such that ℙ(ηt Λ) = 1.
Assumption A10 implies that κη = () < ∞ and makes assumption A2 superfluous. The identiliability assumption A12 is slightly stronger than the first part of assumption A3 when the distribution of ηt is not symmetric. We are now in a position to state conditions ensuring the asymptotic normality of the QMLE of an ARMA-GARCH model.
Theorem 7.5 (Asymptotic normality of the QMLE) Assume that Eηt = 0 and that assumptions A3, A4 and A8–A12 hold true. Then
where ,
If, in addition, the distribution of ηt is symmetric, we have
Remark 7 4
1. It is interesting to note that if ηt has a symmetric law, then the asymptotic variance Σ is block-diagonal, which is interpreted as an asymptotic independence between the estimators of the ARMA coefficients and those of the GARCH coefficients. The asymptotic distribution of the estimators of the ARMA coefficients depends, however, on the GARCH coefficients (in view of the form of the matrices I1 and J1 involving the derivatives of ). On the other hand, still when the distribution of ηt is symmetric, the asymptotic accuracy of the estimation of the GARCH parameters is not affected by the ARMA part: the lower left block of Σ depends only on the GARCH coefficients. The block-diagonal form of Σ may also be of interest for testing problems of joint assumptions on the ARMA and GARCH parameters.
2. Assumption A11 imposes the strict positivity of the GARCH coefficients and it is easy to see that this assumption constrains only the GARCH coefficients. For any value of 0, the restriction of Ф to its first P + Q + 1 coordinates can be chosen sufficiently large so that its interior contains 0 and assumption A8 is satisfied.
3. In the proof of the theorem, the symmetry of the iid process distribution is used to show the following result, which is of independent interest.
If the distribution of ηt is symmetric then,
provided this expectation exists (see Exercise 7.1).
Example 73 (Numerical evaluation of the asymptotic variance) Consider the AR(1)-ARCH(l) model defined by (7.23). In the case where ηt follows the (0, 1) law, condition A10 for the existence of a moment of order 4 is written as , that is, α0 < 0.577 (see (2.54)). In the case where ηt follows the χ2(1) distribution, normalized in such a way that Eηt = 0 and E = 1, this condition is written as , that is, α0 < 0.258. To simplify the computation, assume that ω0 = 1 is known. Table 7.3 provides a numerical evaluation of the asymptotic variance Σ, for these two distributions and for different values of the parameters a0 and ρ0. It is clear that the asymptotic variance of the two parameters strongly depends on the distribution of the iid process. These experiments confirm the independence of the asymptotic distributions of the AR and ARCH parameters in the case where the distribution of ηt is symmetric. They reveal that the independence does not hold when this assumption is relaxed. Note the strong impact of the ARCH coefficient on the asymptotic variance of the AR coefficient. On the other hand, the simulations confirm that in the case where the distribution is symmetric, the AR coefficient has no impact on the asymptotic accuracy of the ARCH coefficient. When the distribution is not
symmetric, the impact, if there is any, is very weak. For the computation of the expectations involved in the matrix Σ, see Exercise 7.8. In particular, the values corresponding to α0 = 0 (AR(1) without ARCH effect) can be analytically computed. Note also that the results obtained for the asymptotic variance of the estimator of the ARCH coefficient in the case a0 = 0 do not coincide with those of Table 7.2. This is not surprising because in this table ω0 is not supposed to be known.
In this section, we employ the QML method to estimate GARCH(1, 1) models on daily returns of 11 stock market indices, namely the CAC, DAX, DJA, DJI, DJT, DJU, FTSE, Nasdaq, Nikkei, SMI and S&P 500 indices. The observations cover the period from January 2, 1990 to January 22, 20091 (except for those indices for which the first observation is after 1990). The GARCH(1, 1) model has been chosen because It constitutes the reference model, by far the most commonly used in empirical studies. However, in Chapter 8 we will see that it can be worth considering models with higher orders p and q.
Table 7.4 displays the estimators of the parameters ω, α, β, together with their estimated standard deviations. The last column gives estimates of , obtained by replacing the unknown parameters by their estimates and by the empirical mean of the fourth-order moment of the standardized residuals. We have if and only If ρ4 < 1. The
estimates of the GARCH coefficients are quite homogenous over all the series, and are similar to those usually obtained in empirical studies of daily returns. The coefficients α are close to 0.1, and the coefficients β are close to 0.9, which indicates a strong persistence of the shocks on the volatility. The sum α + β is greater than 0.98 for 10 of the 11 series, and greater than 0.96 for all the series. Since α + β < 1, the assumption of second-order statlonarity cannot be rejected, for any series (see Section 8.1). A fortiori, by Remark 2.6 the strict statlonarity cannot be rejected. Note that the strict statlonarity assumption, E log (α1 + ²1) < 0, seems difficult to test directly because it not only relies on the GARCH coefficients but also involves the unknown distribution of ηt. The existence of moments of order 4, < ∞, is questionable for all the series because is extremely close to 1. Recall, however, that the asymptotic properties of the QML do not require any moment on the observed process but do require strict stationarity.
7.4 Proofs of the Asymptotic Results*
We denote by K and p generic constants whose values can vary from line to line. As an example, one can write for 0 < ρ1 < 1 and 0 < ρ2 < 1, i1 ≥ 0, i2 ≥ 0,
Proof of Theorem 7.1
The proof is based on a vectorial autoregressive representation of order 1 of the vector analogous to that used for the study of stationarity. Assumption A2 allows us to write as a series depending on the infinite past of the variable . It can be shown that the initial values are not Important asymptotically, using the fact that, under the strict stationarity assumption, necessarily admits a moment order s, with s > 0. This property also allows us to show that the expectation of t(θ0) is well defined in and that (t(θ)) − (t(θ0)) ≥ 0, which guarantees that the limit criterion is minimized at the true value. The difficulty is that can be equal to +∞. Assumptions A3 and A4 are crucial to establishing the identifiability: the former assumption precludes the existence of a constant linear combination of the , j > 0. The assumption of absence of common root is also used. The ergodicity of t(θ) and a compactness argument conclude the proof.
It will be convenient to rewrite (7.10) in matrix form. We have
where
We will establish the following intermediate results.
(a) limn→∞ supθΘ |In(θ) − n(θ)| = 0, a.s.
(b) ( t such that a.s.) ⇒ θ = θ0.
(c) , and if .
(d) For any θ ≠ θ0, there exists a neighborhood V(θ) such that
(a) Asymptotic irrelevance of the initial values. In view of Corollary 2.2, the condition of assumption A2 implies that ρ(B) < 1. The compactness of Θ implies that
Iterating (7.25), we thus obtain
Let be the vector obtained by replacing by in , and let be the vector obtained by replacing by the initial values (7.6) or (7.7). We have
From (7.27), it follows that almost surely
For x > 0 we have log x ≤ x − 1. It follows that, for . We thus have almost surely, using (7.30),
The existence of a moment of order s > 0 for , deduced from assumption Al and Corollary 2.3, allows us to show that a.s. (see Exercise 7.2). Using Cesàro’s lemma, point (a) follows.
(b) Identifiability of the parameter. Assume that a.s. By Corollary 2.2, the polynomial θ(B) is invertible under assumption A2. Using (7.10), we obtain
If the operator in B between braces were not null, then there would exist a constant linear combination of the , j > 0. Thus the linear innovation of the process () would be equal to zero. Since the distribution of is nondegenerate, in view of assumption A3,
We thus have
Under assumption A4 (absence of common root), it follows that Aθ(z) = (z), Bθ(z) = (z) and ω = ω0. We have thus shown (b).
(c) The limit criterion is minimized at the true value. The limit criterion is not integrable at any point, but is well defined in {+ ∞} because, with the notation x− = max(−x, 0) and x+ = max(x, 0),
It is, however, possible to have for some values of θ. This occurs, for instance, when θ = (ω, 0, …, 0) and (t) is an IGARCH such that . We will see that this cannot occur at θ0, meaning that the criterion is integrable at θ0. To establish this result, we have to show that . Using Jensen’s inequality and, once again, the existence of a moment of order s > 0 for , we obtain
because
Thus
Having already established that , it follows that is well defined in . Since for all x > 0, log x ≤ x − 1 with equality if and only if x = 1, we have
with equality if and only if (θ0)/(θ)-a.s., that is, in view of (b), if and only if θ = θ0.3
(d) Compactness of Θ and ergodicity of (t(θ)). For all θ Θ and any positive integer k, let Vk(θ) be the open ball of center θ and radius l/k. Because of (a), we have
To obtain the convergence of this empirical mean, the standard ergodic theorem cannot be applied (see Theorem A.2) because we have seen that t(θ*) is not necessarily integrable, except at θ0. We thus use a modified version of this theorem, which allows for an ergodic and strictly stationary sequence of variables admitting an expectation in {+∞)} (see Exercise 7.3). This version of the ergodic theorem can be applied to {t (θ*)}, and thus to {} (see Exercise 7.4), which allows us to conclude that
By Beppo Levi’s theorem, increases to as k → ∞. Given (7.33), we have shown (d).
The conclusion of the proof uses a compactness argument. First note that for any neighborhood V(θ0) of θ0,
The compact set Θ is covered by the union of an arbitrary neighborhood V(θ0) of θ0 and the set of the neighborhoods V(θ) satisfying (d), θ Θ V(θ0). Thus, there exists a finite subcover of Θ of the form V(θ0), V(θ1), …, V(θk), where, for i = 1, …, k, V(θi) satisfies (d). It follows that
The relations (d) and (7.34) show that, almost surely, n belongs to V(θ0) for n large enough. Since this is true for any neighborhood V(θ0), the proof is complete.
Proof of Theorem 7.2
The proof of this theorem is based on a standard Taylor expansion of criterion (7.8) at θ0. Since n converges to θ0, which lies in the interior of the parameter space by assumption A5, the derivative of the criterion is equal to zero at n. We thus have
where the are between n and θ0. It will be shown that
and that
The proof of the theorem immediately follows. We will split the proof of (7.36) and (7.37) into several parts:
(a)
(b) J is invertible and .
(c) There exists a neighborhood (θ0) of θ0 such that, for all i, j, k {1, …, p + q + 1},
(d) and tend in probability to 0 as n → ∞.
(e) .
(f) .
(a) Integrability of the derivatives of the criterion at θ0. Since , we have
At θ = θ0, the variable / = is independent of and its derivatives. To show (a), it thus suffices to show that
In view of (7.28), we have
where , and B(j) is a p × p matrix with 1 in position (1, j) and zeros elsewhere. Note that, in view of the positivity of the coefficients and (7.41)–(7.42), the derivatives of are positive or null In view of (7.41), it is clear that ∂/ ∂ ωis bounded. Since , the variable is also bounded. This variable thus possesses moments of all orders. In view of the second equality in (7.41) and of the positivity of all the terms involved in the sums, we have
It follows that
The variable thus admits moments of all orders at θ = θ0. In view of (7.42) and βjB(j) ≤ B, we have
Using (7.27), we have Bk ≤ Kρk for all k. Moreover, having a moment of order s (0, 1), the variable has the same moment.4 Using in addition (7.44), the inequality and the relation x/(1 + x) ≤ xs for all x ≥ 0,5 we obtain
Under assumption A5 we have β0j for all j which entails that the first expectation in (7.40) exists.
We now turn to the higher-order derivatives of . In view of the first equality of (7.41), we have
We thus have
which is a vector of finite constants (since ρ(B) < 1). It follows that is bounded, and thus admits moments of all orders. It is of course the same for . The second equality of (7.41) gives
The arguments used for (7.45) then show that
This entails that is integrable. Differentiating relation (7.42) with respect to βj′, we obtain
because βjB(j) ≤ B. As for (7.45), it follows that
and the existence of the second expectation in (7.40) is proven.
Since is bounded, and since by (7.43) the variables are bounded at θ0, it is clear that
for i = 1, …, q + 1. With the notation and arguments already used to show (7.45), and using the elementary inequality x/(1 + x) ≤ xs/2 for all x ≥ 0, Minkowski’s inequality implies that
Finally, the Cauchy–Schwarz inequality entails that the third expectation of (7.40) exists.
(b) Invertibility of J and connection with the variance of the criterion derivative. Using (a), and once again the independence between and and its derivatives, we have by (7.38),
Moreover, in view of (7.40), J exists and satisfies (7.13). We also have
Assume now that J is singular. Then there exists a nonzero vector λ in p + q + 1 such that a.s.6 In view of (7.10) and the stationarity of , we have
Let λ = (λ0, λ1, …, λq + p)′. It is clear that λ1 = 0, otherwise would be measurable with respect to the σ-field generated by {ηu, u < t − 1}. For the same reason, we have λ2 = … = λ2+1 = 0 if λq+1 = … = λq+i = 0. Consequently, λ ≠ 0 implies the existence of a GARCH(p − 1, q − 1) representation. By the arguments used to show (7.32), assumption A4 entails that this is impossible. It follows that λ′Jλ = 0 implies λ = 0, which completes the proof of (b).
(c) Uniform integrability of the third-order derivatives of the criterion. Differentiating (7.39), we obtain
We begin by studying the integrability of {1− /}. This is the most difficult term to deal with. Indeed, the variable / is not uniformly integrable on Θ: at θ = (ω, 0′), the ratio / is
integrable only if E exists. We will, however, show the integrability of {1 − /} uniformly in θ in the neighborhood of θ0. Let Θ* be a compact set which contains θ0 and which is contained in the interior of Θ (θ Θ*, we have θ ≥ θ* > 0 component by component). Let B0 be the matrix B (defined in (7.26)) evaluated at the point θ = θ0. For all δ > 0, there exists a neighborhood (θ0) of θ0, included in Θ*, such that for all θ (θ0),
Note that, since (θ0) Θ*, we have 1/αi < ∞. From (7.28), we obtain
and, again using x/(l + x) ≤ xs for all x ≥ 0 and all s (0, 1),
If s is chosen such that Es < ∞ and, for instance, δ = (1 − ρs)/(2ρs), then the expectation of the previous series is finite. It follows that there exists a neighborhood (θ0) of θ0 such that
Using (7.51), keeping the same choice of δ but taking s such that Es < ∞, the triangle inequality gives
Now consider the second term in braces in (7.50). Differentiating (7.46), (7.47) and (7.48), with the arguments used to show (7.43), we obtain
when the indices i1, i2 and i3 are not all in {q + 1, q + 2, …, q + 1 + p} (that is, when the derivative is taken with respect to at least one parameter different from the βj). Using again the arguments used to show (7.44) and (7.48), and then (7.45), we obtain
for any s (0, 1). Since for some s > 0, it follows that
It is easy to see that in this inequality the power 2 can be replaced by any power d:
Using the Cauchy-Schwarz inequality, (7.52) and (7.53), we obtain
The other terms in braces in (7.50) are handled similarly. We show in particular that
for any integer d. With the aid of Hölder’s inequality, this allows us to establish, in particular, that
Thus we obtain (c).
(d) Asymptotic decrease of the effect of the initial values. Using (7.29), we obtain the analogs of (7.41) and (7.42) for the derivatives of :
where is equal to (0, …, 0)′ when the initial conditions are given by (7.7), and is equal to (1, …, 1)′ when the initial conditions are given by (7.6). The second-order derivatives have similar expressions. The compactness of 0 and the fact that ρ(B) < 1 together allow us to claim that, almost surely,
Using (7.30), we obtain
Since
we have, using (7.59) and the first inequality in (7.58),
It follows that
Markov’s inequality, (7.40), and the independence between ηt and imply that, for all ε > 0,
which, by (7.60), shows the first part of (d).
Now consider the asymptotic impact of the initial values on the second-order derivatives of the criterion in a neighborhood of θ0. In view of (7.39) and the previous computations, we have
where
In view of (7.52), (7.54) and Holder's inequality, it can be seen that, for a certain neighborhood (θ0), the expectation of t is a finite constant. Using Markov’s inequality once again, the second convergence of (d) is then shown.
(e) CLT for martingale increments. The conditional score vector is obviously centered, which can be seen from (7.38), using the fact that and its derivatives belong to the σ-field generated by {t−t, i ≥ 0}, and the fact that :
Note also that, by (7.49), is finite. In view of the invertibility of J and the assumptions on the distribution of ηt (which entail 0 < κn − 1 < ∞), this covariance matrix is nondegenerate. It follows that, for all λ p+q+1, the sequence is a square integrable ergodic stationary martingale difference. Corollary A.l and the Cramér-Wold theorem (see, for example, Billingsley, 1995, pp. 383, 476 and 360) entail (e).
(f) Use of a second Taylor expansion and of the ergodic theorem Consider the Taylor expansion (7.35) of the criterion at θ0. We have, for all i and j,
where ij is between .θ*ij and θ0. The almost sure convergence of ij to θ0, the ergodic theorem and (c) Imply that almost surely
Since almost surely, the second term on the right-hand side of (7.61) converges to 0 with probability 1. By the ergodic theorem, the first term on the right-hand side of (7.61) converges to J(i, j).
To complete the proof of Theorem 7.2, it suffices to apply Slutsky’s lemma. In view of (d), (e) and (f) we obtain (7.36) and (7.37).
Proof of the Results of Section 7.1.3
Proof of Lemma 7.1. We have
Thus
using (7.18) for the latter inequality. It follows that log ρnhn, and thus ρnhn, tend almost surely to +∞ as n → ∞. Now if ρnhn → +∞ and , then for any ε > 0, the sequence () admits an Infinite number of terms less than ε. Since the sequence () is ergodic and stationary, we have (). Since ε is arbitrary, we have , which is in contradiction to (7.16).
Proof of (7.19). Note that
where
We have
For all θ Θ, we have α ≠ 0. Letting
and
we have
since, by Lemma 7.1, → ∞almost surely as t → ∞. It is easy to see that this convergence is uniform on the compact set Θ:
Let and be two constants such that . It can always be assumed that . With the notation , the solution of
is . This solution belongs to the interval when n is large enough. In this case
is one of the two extremities of the interval , and thus
This result and (7.62) show that almost surely
Since minθ Qn(θ) ≤ Qn(0) = 0, it follows that
Since is an interval which contains α0 and can be arbitrarily small, we obtain the result.
To prove the asymptotic normality of the QMLE, we need the following intermediate result.
Lemma 7.2 Under the assumptions of Theorem 7.3, we have
Proof. Using Lemma 7.1, there exists a real random variable K and a constant σ (0, 1) independent of θ and of t such that
Since it has a finite expectation, the series is almost surely finite. This shows (7.63), and (7.64) follows similarly. We have
where
and
as t → ∞. Thus (7.65) is shown. To show (7.66), it suffices to note that
Proof of (7.20). We remark that we do not know, a priori, if the derivative of the criterion is equal to zero at , because we only have the convergence of to α0. Thus the minimum of the criterion could lie at the boundary of Θ, even asymptotically. By contrast, the partial derivative with respect to the second coordinate must asymptotically vanish at the optimum, since n → α0 and θ0 . A Taylor expansion of the derivative of the criterion thus gives
where Jn is a 2 × 2 matrix whose elements are of the form
with . between n and θ0. By Lemma 7.1, which shows that →∞almost surely, and by the central limit theorem of Lindeberg for martingale increment (see Corollary A.l),
Relation (7.64) of Lemma 7.2 and the compactness of 0 show that
By a Taylor expansion of the function
we obtain
where α* is between and α0. Using (7.65), (7.66) and (7.19), we obtain
We conclude using the second row of (7.67), and also using (7.68), (7.69) and (7.70).
The proof follows the steps of the proof of Theorem 7.1. We will show the following points:
(a) , a.s.
(b) .
(c) If .
(d) For any ≠ 0 there exists a neighborhood V() such that
(a) Nullity of the asymptotic impact of the initial values. Equations (7.10)–(7.28) remain valid under the convention that t = t (). Equation (7.29) must be replaced by
where , the “tilde” variables being initialized as indicated before. Assumptions A7 and A8 imply that,
It follows that almost surely
and thus, by (7.28), (7.71) and (7.27),
Similarly, we have that almost surely . The difference between the theoretical log-likelihoods with and without initial values can thus be bounded as follows:
This inequality is analogous to (7.31), + 1 being replaced by . Following the lines of the proof of (a) in Theorem 7.1 (see Exercise 7.2), It suffices to show that for all real r > 0, E(ρtξt)r is the general term of a finite series. Note that7
since, by Corollary 2.3, s. Statement (a) follows.
(b) Identifiability of the parameter. If t() = t(0) almost surely, assumptions A8 and A9 imply that there exists a constant linear combination of the variables Xt−j, j ≥ 0. The linear Innovation of (Xt), equal to Xt − E(Xt|Xu, u < t) = ηtσt(0), is zero almost surely only if ηt = 0 a.s. (since ). This is precluded, since E . It follows that = 0 and thus that θ = θ0 by the argument used in the proof of Theorem 7.1.
(c) The limit criterion is minimized at the true value. By the arguments used in the proof of (c) in Theorem 7.1, it can ne shown that, for all , E0In() = E0t() is defined in {+∞}, and in at = 0. We have
because the last expectation Is equal to 0 (noting that t() − t(0) belongs to the past, as well as σt (0) and σt()), the other expectations being positive or null by arguments already used. This inequality is strict only if t() = t(0) and if a.s. which, by (b), implies = 0 and completes the proof of (c).
(d) Use of the compactness of Ф and of the ergodicity of (t()). The end of the proof is the same as that of Theorem 7.1.
Proof of Theorem 7.5
The proof follows the steps of that of Theorem 7.2. The block-diagonal form of the matrices and when the distribution of ηt is symmetric Is shown in Exercise 7.7. It suffices to establish the following properties.
(b) and are invertible.
(c) and tend in probability to 0 as n → ∞.
(d) .
(e) s.d., for all * between and 0.
Formulas (7.38) and (7.39) giving the derivatives with respect to the GARCH parameters (that is, the vector θ) remain valid in the presence of an ARMA part (writing = ()). The same is true for all the results established in (a) and (b) of the proof of Theorem 7.2, with obvious changes of notation. The derivatives of with respect to the parameter , and the cross derivatives with respect to θ and , are given by
The derivatives of t are of the form
where
and
where Hk, (t) is the k × (Hankel) matrix of general term t−i−j, and 0k, denotes the null matrix of size k × . Moreover, by (7.28),
where j denotes the jth component of , and
(a) Integrability of the derivatives of the criterion at φ0 The existence of the expectations in (7.40) remains true. By (7.74)–(7.76), the independence between (t/σt)(0) = ηt and its derivatives, and the derivatives of t(0), using E < ∞and ( 0) > ω0 >, it suffices to show that
to establish point (a), together with the existence of the matrices and . By the expressions for the derivatives of t, (7.77)–(7.78), and using E ( 0) < ∞, we obtain (7.81).
The Cauchy-Schwarz inequality implies that
Thus, in view of (7.79) and the positivity of ω0,
Using the triangle inequality and the elementary inequalities and x/(1 + x2) ≤ 1, it follows that
The first inequality of (7.82) follows. The existence of the second expectation in (7.82) is a consequence of (7.80), the Cauchy-Schwarz inequality, and the square integrability of t and its derivatives. To handle the second-order partial derivatives of , first note that by (7.41). Moreover, using (7.79),
By the arguments used to show (7.44), we obtain
which entails the existence of the third expectation in (7.82).
(b) Invertibility of and . Assume that is noninvertible. There exists a nonzero vector λ in p+Q+p+q+2 such that λ′∂t (0)/∂′ = 0 a.s. By (7.38) and (7.74), this implies that
Taking the variance of the left-hand side, conditionally on the σ-field generated by {χu, u < t}, we obtain a.s., at = 0,
where . It follows that and } a.s. By stationarity, we have either } a.s. for all t, or } a.s. for all t. Consider for instance the latter case, the first one being treated similarly. Relation (7.86) implies a.s. The term in brackets cannot vanish almost surely, otherwise ηt would take at least two different values, which would be in contradiction to assumption A12. It follows that at = 0 a.s. and thus bt = 0 a.s. We have shown that almost surely
where λ1 is the vector of the first P + Q + 1 components of λ. By stationarity of (∂t/∂)t, the first equality implies that
We now use assumption A9, that the ARMA representation is minimal, to conclude that λ1 = 0. The third equality in (7.87) is then written, with obvious notation, as . We have already shown in the proof of Theorem 7.2 that this entails λ2 = 0. We are led to a contradiction, which proves that is invertible. Using (7.39) and (7.75)–(7.76), we obtain
We have just shown that the first expectation is a positive definite matrix. The second expectation being a positive semi-definite matrix, is positive definite and thus invertible, which completes the proof of (b).
(c) Asymptotic unimportance of the initial values. The initial values being fixed, the derivatives of , obtained from (7.71), are given by
with the notation introduced in (7.41)–(7.42) and (7.55)–(7.56). As for (7.79), we obtain
and, by an obvious extension of (7.72),
Thus
The latter sum converges almost surely because its expectation is finite. We have thus shown that
The other derivatives of are handled similarly, and we obtain
We have, in view of (7.73),
where . It is also easy to check that for = 0
It follows that, using (7.88),
Using the independence between ηt and St–1, (7.40), (7.83), the Cauchy-Schwarz inequality and E < ∞, we obtain
which shows the first part of (c). The second is established by the same arguments.
(d) Use of a CLT for martingale increments. The proof of this point is exactly the same as that of the pure GARCH case (see the proof of Theorem 7.2).
(e) Convergence to the matrix . This part of the proof differs drastically from that of Theorem 7.2. For pure GARCH, we used a Taylor expansion of the second-order derivatives of the criterion, and showed that the third-order derivatives were uniformly integrable in a neighborhood of θ0. Without additional assumptions, this argument fails in the ARMA-GARCH case because variables of the form do not necessarily have moments of all orders, even at the true value of the parameter. First note that, since exists, the ergodic theorem implies that
The consistency of having already been established, it suffices to show that for all ε > 0, there exists a neighborhood (0) of 0 such that almost surely
(see Exercise 7.9). We first show that there exists (0) such that
By Hölder’s inequality, (7.39), (7.75) and (7.76), it suffices to show that for any neighborhood (0) Ф whose elements have their components αi, and βj bounded above by a positive constant, the quantities
are finite. Using the expansion of the series
similar expansions for the derivatives, and t (0)4 < ∞, it can be seen that the norms in (7.91) are finite. In (7.92) the first norm is finite, as an obvious consequence of , this latter term being strictly positive by compactness of Ф. An extension of inequality (7.83) leads to
Moreover, since (7.41)–(7.44) remain valid when εt is replaced by t(), it can be shown that
for any d > 0 and any neighborhood (0) whose elements have their components αi and βj bounded from below by a positive constant. The norms in (7.92) are thus finite. The existence of the first norm of (7.93) follows from (7.80) and (7.91). To handle the second one, we use (7.84), (7.85), (7.91), and the fact that . Finally, it can be shown that the third norm is finite by (7.47), (7.48) and by arguments already used. The property (7.90) is thus established. The ergodic theorem shows that the limit in (7.89) is equal almost surely to
By the dominated convergence theorem, using (7.90), this expectation tends to 0 when the neighborhood (0) tends to the singleton {0}. Thus (7.89) hold true, which proves (e). The proof of Theorem 7.5 is now complete.
The asymptotic properties of the QMLE of the ARCH models have been established by Weiss (1986) under the condition that the moment of order 4 exists. In the GARCH(1, 1) case, the asymptotic properties have been established by Lumsdaine (1996) (see also Lee and Hansen, 1994) for the local QMLE under the strict stationarity assumption. In Lumsdaine (1996) the conditions on the coefficients α1 and β1 allow to handle the IGARCH(1, 1) model. They are, however, very restrictive with regard to the iid process: it is assumed that E|ηt|32 < ∞ and that the density of ηt has a unique mode and is bounded in a neighborhood of 0. In Lee and Hansen (1994) the consistency of the global estimator is obtained under the assumption of second-order stationarity.
Berkes, Horváth and Kokoszka (2003b) was the first paper to give a rigorous proof of the asymptotic properties of the QMLE in the GARCH (p, q) case under very weak assumptions; see also Berkes and Horváth (2003b, 2004), together with Boussama (1998, 2000). The assumptions given in Berkes, Horváth and Kokoszka (2003b) were weakened slightly in Francq and Zakoäan (2004). The proofs presented here come from that paper. An extension to non-iid errors was recently proposed by Escanciano (2009).
Jensen and Rahbek (2004a, 2004b) have shown that the parameter α0 of an ARCH(l) model, or the parameters α0 and β0 of a GARCH(1, 1) model, can be consistently estimated, with a standard Gaussian asymptotic distribution and a standard rate of convergence, even if the parameters are outside the strict stationarity region. They considered a constrained version of the QMLE, in which the intercept ω is fixed (see Exercises 7.13 and 7.14). These results were misunderstood by a number of researchers and practitioners, who wrongly claimed that the QMLE of the GARCH parameters is consistent and asymptotically normal without any stationarity constraint. We have seen in Section 7.1.3 that the QMLE of ω0 is inconsistent in the nonstationary case.
For ARMA-GARCH models, asymptotic results have been established by Ling and Li (1997, 1998), Ling and McAleer (2003a, 2003b) and Francq and Zakoïan (2004). A comparison of the assumptions used in these papers can be found in the last reference. We refer the reader to Straumann (2005) for a detailed monograph on the estimation of GARCH models, to Francq and Zakoïan (2009a) for a recent review of the literature, and to Straumann and Mikosch (2006) and Bardet and Wintenberger (2009) for extensions to other conditionally heteroscedastic models. Li, Ling and McAleer (2002) reviewed the literature on the estimation of ARMA-GARCH models, including in particular the case of nonstationary models.
The proof of the asymptotic normality of the QMLE of ARMA models under the second-order moment assumption can be found, for instance, in Brockwell and Davis (1991). For ARMA models with infinite variance noise, see Davis, Knight and Liu (1992), Mikosch, Gadrich, Klüppelberg and Adler (1995) and Kokoszka and Taqqu (1996).
71 (The distribution of ηt is symmetric for GARCH models) The aim of this exercise is to show property (7.24).
1. Show the result for j < 0.
2. For j ≥ 0, explain why can be written as for some function h.
3. Complete the proof of (7.24).
7.2 (Almost sure convergence to zero at an exponential rate)
Let (t) be a strictly stationary process admitting a moment order s > 0. Show that if ρ (0, 1), then a.s.
7.3 (Ergodic theorem for nonintegrable processes)
Prove the following ergodic theorem. If (Xt) is an ergodic and strictly stationary process and if EX1 exists in {+∞}, then
The result is shown in Billingsley (1995, p. 284) for iid variables.
Hint: Consider the truncated variables where κ > 0 with κ tending to +∞.
7.4 (Uniform ergodic theorem)
Let {Xt(θ)} be a process of the form
where (ηt) is strictly stationary and ergodic and f is continuous in θ Ф,Ф being a compact subset of d.
1. Show that the process {infθФ Xt(θ)} is strictly stationary and ergodic.
2. Does the property still hold true if Xt(θ) is not of the form (7.94) but it is assumed that {Xt(θ)} is strictly stationary and ergodic and that Xt(θ) is a continuous function of θ
7.5 (OLS estimator of a GARCH)
In the framework of the GARCH(p, q) model (7.1), an OLS estimator of θ is defined as any measurable solution n of
where
and is defined by (7.4) with, for instance, initial values given by (7.6) or (7.7). Note that the estimator is unconstrained and that the variable can take negative values. Similarly, a constrained OLS estimator is defined by
The aim of this exercise is to show that under the assumptions of Theorem 7.1, and if , the constrained and unconstrained OLS estimators are strongly consistent. We consider the theoretical criterion
1. Show that almost surely as n → ∞.
2. Show that the asymptotic criterion is minimized at θ0,
and that θ0 is the unique minimum.
3. Prove that n → θ0 almost surely as n → ∞.
4. Show that almost surely as n → ∞.
7.6 (The mean of the squares of the normalized residuals is equal to 1)
For a GARCH model, estimated by QML with initial values set to zero, the normalized residuals are defined by . Show that almost surely
Hint: Note that for all c > 0, there exists such that for all t ≥ 0, and consider the function .
7.7 ( and block-diagonal)
Show that and have the block-diagonal form given in Theorem 7.5 when the distribution of ηt is symmetric.
7.8 (Forms of land Jin the AR(1)-ARCH(1) case)
We consider the QML estimation of the AR(1)-ARCH(1) model
assuming that ω0 = 1 is known and without specifying the distribution of ηt.
1. Give the explicit form of the matrices and in Theorem 7.5 (with an obvious adaptation of the notation because the parameter here Is (a0, α0)).
2. Give the block-diagonal form of these matrices when the distribution of ηt is symmetric, and verify that the asymptotic variance of the estimator of the ARCH parameter
(i) doe not depend on the AR parameter, and
(ii) is the same as for the estimator of a pure ARCH (without the AR part).
3. Compute Σ when α0 = 0. Is the asymptotic variance of the estimator of a0 the same as that obtained when estimating an AR(1)? Verify the results obtained by simulation in the corresponding column of Table 7.3.
7.9 (A useful result in showing asymptotic normality)
Let (Jt(θ)) be a sequence of random matrices, which are function of a vector of parameters θ. We consider an estimator n which strongly converges to the vector θ0. Assume that
where J is a matrix. Show that if for all ε > 0 there exists a neighborhood V(θ0) of θ0 such that
where · denotes a matrix norm, then
Give an example showing that condition (7.95) is not necessary for the latter convergence to hold in probability.
7.10 (A lower bound for the asymptotic variance of the QMLE of an ARCH)
Show that, for the ARCH(q) model, under the assumptions of Theorem 7.2,
in the sense that the difference is a positive semi-definite matrix.
Hint: Compute and show that J – Jθ0θ′0J is a variance matrix.
7.11 (A striking property of J)
For a GARCH(p, q) model we have, under the assumptions of Theorem 7.2,
The objective of the exercise is to show that
1. Show the property in the ARCH case.
Hint: Compute and .
2. In the GARCH case, let . Show that
3. Complete the proof of (7.96).
7.12 (A condition required for the generalized Bartlett formula)
Using (7.24), show that if the distribution of ηt is symmetric and if E < ∞, then formula (B.13) holds true, that is,
7.13 (Constrained QMLE of the parameter α0 of a nonstationary ARCH(1) process)
Jensen and Rahbek (2004a) consider the ARCH(l) model (7.15), in which the parameter ω0 > 0 is assumed to be known (ω0 = 1 for instance) and where only α0 is unknown. They work with the constrained QMLE of α0 defined by
where . Assume therefore that ω0 = 1 and suppose that the nonstationarity condition (7.16) id satisfied.
1. Verify that
and that
2. Prove that
3. Determine the almost sure limit of
4. Show that for all , almost surely
5. Prove that if almost surely (see Exercise 7.14) then
6. Does the result change when and ω0 ≠ 1?
7. Discuss the practical usefulness of this result for estimating ARCH models.
7.14 (Strong consistency of Jensen and Rahbek’s estimator)
We consider the framework of Exercise 7.13, and follow the lines of the proof of (7.19) on page 169.
1. Show that (1)converges almost surely to α0 when ω0= 1.
2. Does the result change if (1) is replaced by and if ω and ω0 are arbitrary positive numbers? Does it entail the convergence result (7.19)?
1 For the Nasdaq an outlier has been eliminated because the base price was reset on the trading day following December 31, 1993.
2 We use here the fact that (f + g)− ≤ g− for f ≥ 0, and that if f ≤ g then f− ≥ g−.
3 To show (7.33) it can be assumed that and that (in order to use the linearity property of the expectation), otherwise and the relation is trivially satisfied.
4 We use the inequality (a + b)s ≤ as + bs for all a, b ≥ 0 and any s (0. 1]. Indeed, xs > x for all x [0, 1], and if .
5 If x ≥ 1 then x4 ≥ 1 ≥ x/(l + x). If 0 ≤ x ≤ 1 then xs ≥ x ≥ x/(l + x).
6 We have
if and only if a.s., that is, if and only if a.s.
7 We use the fact that if X and Y are positive random variables, E(X + Y)r ≤ E(X)r + E(Y)r for all r (0, 1], this inequality being trivially obtained from the inequality already used: (a + b)r ≤ ar + br for all positive real numbers a and b.