3
Mixing*

It will be shown that, under mild conditions, GARCH processes are geometrically ergodic and β ‐mixing. These properties entail the existence of laws of large numbers and of central limit theorems (see Appendix A), and thus play an important role in the statistical analysis of GARCH processes. This chapter relies on the Markov chain techniques set out, for example, by Meyn and Tweedie (1996).

3.1 Markov Chains with Continuous State Space

Recall that for a Markov chain only the most recent past is of use in obtaining the conditional distribution. More precisely, (X t ) is said to be a homogeneous Markov chain, evolving on a space E (called the state space) equipped with a σ‐field , if for all x ∈ E , and for all B ∈  ,

3.1 equation

In this equation, P t (x, B) corresponds to the transition probability of moving from the state x to the set B in t steps. The Markov property refers to the fact that P t (x, B) does not depend on X r ,  r < s . The fact that this probability does not depend on s is referred to as time homogeneity. For simplicity, we write P (x, B) = P 1(x, B). The function P : E ×  → [0, 1] is called a transition kernel and satisfies:

  1. B ∈  , the function P(⋅, B) is measurable;
  2. x ∈ E , the function P(x, ⋅) is a probability measure on (E, ).

The law of the process (X t ) is characterised by an initial probability measure μ and a transition kernel P . For all integers t and all (t + 1)‐tuples (B 0, …, B t ) of elements of , we set

3.2 equation

In what follows, (X t ) denotes a Markov chain on E = ℝ d and is the Borel σ ‐field.

Irreducibility and Recurrence

The Markov chain (X t ) is said to be φ irreducible for a non‐trivial (that is, not identically equal to zero) measure φ on (E, ), if

equation

If (X t ) is φ ‐irreducible, it can be shown that there exists a maximal irreducibility measure, that is, an irreducibility measure M such that all the other irreducibility measures are absolutely continuous with respect to M . If M(B) = 0, then the set of points from which B is accessible is also of zero measure (see Meyn and Tweedie 1996, Proposition 4.2.2). Such a measure M is not unique, but the set

equation

does not depend on the maximal irreducibility measure M . For a particular model, finding a measure that makes the chain irreducible may be a non‐trivial problem (but see Exercise 3.1 for an example of a time series model for which the determination of such a measure is very simple).

A φ ‐irreducible chain is called recurrent if

equation

and is called transient if

equation

Note that images can be interpreted as the average time that the chain spends in B when it starts at x . It can be shown that a φ ‐irreducible chain (X t ) is either recurrent or transient (see Meyn and Tweedie 1996, Theorem 8.3.4). It is said that (X t ) is positive recurrent if

equation

If a φ ‐irreducible chain is not positive recurrent, it is called null recurrent. For a φ ‐irreducible chain, positive recurrence is equivalent to the existence of a (unique) invariant probability measure (see Meyn and Tweedie 1996, Theorem 18.2.2), that is, a probability π such that

equation

An important consequence of this equivalence is that, for Markov time series, the issue of finding strict stationarity conditions reduces to that of finding conditions for positive recurrence. Indeed, it can be shown (see Exercise 3.2) that for any chain (X t ) with initial measure μ ,

3.3 equation

For this reason, the invariant probability is also called the stationary probability.

Small Sets and Aperiodicity

For a φ ‐irreducible chain, there exists a class of sets enjoying properties that are similar to those of the elementary states of a finite state space Markov chain. A set C ∈  is called a small set 1 if there exists an integer m ≥ 1 and a nontrivial measure ν on such that

equation

In the AR(1) case, for instance it is easy to find small sets (see Exercise 3.4). For more sophisticated models, the definition is not sufficient and more explicit criteria are needed. For the so‐called Feller chains, we will see below that it is very easy to find small sets. For a general chain, we have the following criterion (see Nummelin 1984, Proposition 2.11): C ∈  + is a small set if there exists A ∈  + such that, for all B ⊂ A,  B ∈  + , there exists T > 0 such that

equation

If the chain is φ ‐irreducible, it can be shown that there exists a countable cover of E by small sets. Moreover, each set B ∈  + contains a small set C ∈  + . The existence of small sets allows us to define cycles for φ ‐irreducible Markov chains with general state space, as in the case of countable space chains. More precisely, the period is the greatest common divisor (gcd) of the set

equation

where C ∈  + is any small set (the gcd is independent of the choice of C ). When d = 1, the chain is said to be aperiodic. Moreover, it can be shown (see Meyn and Tweedie 1996, Theorem 5.4.4) that there exist disjoint sets D 1, …, D d  ∈  such that (with the convention D d + 1 = D 1 ):

  1. i = 1,…, d, ∀ x ∈ D i , P(x, D i + 1) = 1;
  2. φ(E −  ⋃ D i ) = 0.

A necessary and sufficient condition for the aperiodicity of (X t ) is that there exists A ∈  + such that for all B ⊂ A,  B ∈  + , there exists t > 0 such that

3.4 equation

(see Chan 1990, Proposition A1.2).

Geometric Ergodicity and Mixing

In this section, we study the convergence of the probability μ (X t  ∈ ⋅) to a probability π(⋅) independent of the initial probability μ , as t → ∞.

It is easy to see that if there exists a probability measure π such that, for an initial measure μ ,

3.5 equation

where μ (X t  ∈ B) is defined in (3.2) (for (B 0, …, B t ) = (E, …, E, B)), then the probability π is invariant (see Exercise 3.3). Note also that (3.5) holds for any measure μ if and only if

equation

On the other hand, if the chain is irreducible, aperiodic, and admits an invariant probability π, for π‐almost all x ∈ E ,

3.6 equation

where ∥ ⋅ ∥ denotes the total variation norm 2 (see Meyn and Tweedie 1996, Theorem 14.0.1). A chain (X t ) such that the convergence (3.6) holds for all x is said to be ergodic. However, this convergence is not sufficient for mixing. We will define a stronger notion of ergodicity.

The chain (X t ) is called geometrically ergodic if there exists ρ ∈ (0, 1) such that

3.7 equation

Geometric ergodicity entails the so‐called α and β mixing. The general definition of the α ‐ and β ‐mixing coefficients is given in Appendix A.3.1. For a stationary Markov process, the definition of the α ‐mixing coefficient reduces to

equation

where the first supremum is taken over the set of the measurable functions f and g such that f ∣  ≤ 1, g ∣  ≤ 1 (see Bradley 1986, 2005). A general process X = (X t ) is said to be α ‐mixing ( β ‐mixing) if α X (k) ( β X (k)) converges to 0 as k → ∞. Intuitively, these mixing properties characterise the decrease in dependence when past and future become sufficiently far apart. The α ‐mixing is sometimes called strong mixing, but β ‐mixing entails strong mixing because α X (k) ≤ β X (k) (see Appendix A.3.1).

Davydov (1973) showed that for an ergodic Markov chain (X t ), of invariant probability measure π,

equation

It follows that β X (k) = O(ρ k ) if the convergence (3.7) holds. Thus

3.8 equation

Two Ergodicity Criteria

For particular models, it is generally not easy to directly verify the properties of recurrence, existence of an invariant probability law, and geometric ergodicity. Fortunately, there exist simple criteria on the transition kernel.

We begin by defining the notion of Feller chain. The Markov chain (X t ) is said to be a Feller chain if, for all bounded continuous functions g defined on E , the function of x defined by E(g(X t ) ∣ X t − 1 = x) is continuous. For instance, for an AR(1) we have, with obvious notation,

equation

The continuity of the function x → g(θx + y) for all y , and its boundedness, ensure, by the Lebesgue dominated convergence theorem, that (X t ) is a Feller chain. For a Feller chain, the compact sets C ∈  + are small sets (see Feigin and Tweedie 1985).

The following theorem provides an effective way to show the geometric ergodicity (and thus the β ‐mixing) of numerous Markov processes.

This theorem will be applied to GARCH processes in the next section (see also Exercise 3.5 for a bilinear example). In Eq. (3.10), V can be interpreted as an energy function. When the chain is outside the centre A of the state space, the energy dissipates, on average. When the chain lies inside A , the energy is bounded, by the compactness of A and the continuity of V . Sometimes V is called a test function and (iii) is said to be a drift criterion.

Let us explain why these assumptions imply the existence of an invariant probability measure. For simplicity, assume that the test function V takes its values in [1, + ∞), which will be the case for the applications to GARCH models we will present in the next section. Denote by P the operator which, to a measurable function f in E , associates the function P f defined by

equation

Let P t be the tth iteration of P , obtained by replacing P(x, dy) by P t (x, dy) in the previous integral. By convention P 0 f = f and P 0(x, A) = images A . Equations (3.9) and ( 3.10) and the boundedness of V by some M > 0 on A yield an inequality of the form

equation

where b = M − (1 − δ). Iterating this relation t times, we obtain, for x 0 ∈ A

3.11 equation

It follows (see Exercise 3.6) that there exists a constant κ > 0 such that for n large enough,

3.12 equation

The sequence Q n (x 0, ·) being a sequence of probabilities on (E, ), it admits an accumulation point for vague convergence: there exists a measure π of mass less than 1 and a subsequence (n k ) such that for all continuous functions f with compact support,

3.13 equation

In particular, if we take f = images A in this equality, we obtain π(A) ≥ κ , thus π is not equal to zero. Finally, it can be shown that π is a probability and that (3.13) entails that π is an invariant probability for the chain (X t ) (see Exercise 3.7).

For some models, the drift criterion (iii) is too restrictive because it relies on transitions in only one step. The following criterion, adapted from Meyn and Tweedie (1996, Theorems 19.1.3, 6.2.9, and 6.2.5), is an interesting alternative relying on the transitions in n steps.

The compact C of condition (iii) can be replaced by a small set, but the function V must be bounded on C . When (X t ) is not a Feller chain, a similar criterion exists, for which it is necessary to consider such small sets (see Meyn and Tweedie 1996, Theorem 19.1.3).

3.2 Mixing Properties of GARCH Processes

We begin with the ARCH(1) process because this is the only case where the process t ) is Markovian.

The ARCH(1) Case

Consider the model

3.16 equation

where ω > 0, α ≥ 0 and (η t ) is a sequence of iid (0, 1) variables. The following theorem establishes the mixing property of the ARCH(1) process under the necessary and sufficient strict stationarity condition (see Theorem 2.1 and (2.10)). An extra assumption on the distribution of η t is required, but this assumption is mild:

Note that this assumption includes, in particular, the standard case where f is positive over a neighbourhood of 0, possibly over all . We then have η 0 = 0. Equality (3.17) implies some (local) symmetry of the law of (η t ). This symmetry facilitates the proof of the following theorem, but it can be omitted (see Exercise 3.8).

Step (i) We have

equation

If g is continuous and bounded, the same is true for the function x → g{ψ(x)y}, for all y . By the Lebesgue theorem, it follows that (ε t ) is a Feller chain.

Step (ii) To show the φ ‐irreducibility of the chain, for some measure φ , assume for the moment that η 0 = 0 in Assumption A. Suppose, for instance, that f is positive on [0, τ). Let φ be the restriction of the Lebesgue measure to the interval images . Since images , it can be seen that

equation

It follows that the chain t ) is φ ‐irreducible. In particular, φ = λ if η t has a positive density over .

The proof of the irreducibility in the case η 0 > 0 is more difficult. First note that

equation

Now images by (3.18). Thus we have

equation

Let τ  ∈ (0, τ) be small enough such that

equation

Iterating the model, we obtain that, for ε0 = x fixed,

equation

It follows that the function

equation

is a diffeomorphism between open subsets of t . Moreover, in view of Assumption A, the vector Y t has a density on t . The same is thus true for Z t , and it follows that, given ε0 = x ,

3.19 equation

We now introduce the event

3.20 equation

Assumption A implies that ℙ(Ξ t ) > 0. Conditional on Ξ t , we have

equation

Since the bounds of the interval I t are reached, the intermediate value theorem and (3.19) entail that, given ε0 = x , images has, conditionally on Ξ t , a positive density on I t . It follows that

3.21 equation

where J t  = {x ∈ ℝ ∣ x 2 ∈ I t }. Let

equation

and let λ J be the restriction of the Lebesgue measure to J . We have

equation

The chain t ) is thus φ ‐irreducible with φ = λ J .

Step (iii) We shall use Lemma 2.2. The variable images is almost surely positive and satisfies images and images , in view of assumption ( 3.18). Thus, there exists s > 0 such that

equation

where images The proof of Lemma 2.2 shows that we can assume s ≤ 1. Let V(x) = 1 + x 2s . Condition ( 3.9) is obviously satisfied for all x . Let 0 < δ < 1 − c and let the compact set

equation

Since A is a nonempty closed interval with centre 0, we have φ(A) > 0. Moreover, by the inequality (a + b) s  ≤ a s  + b s for a, b ≥ 0 and s ∈ [0, 1] (see the proof of Corollary 2.3), we have, for x ∉ A ,

equation

which proves condition ( 3.10). It follows that the chain t ) is geometrically ergodic. Therefore, in view of property (3.8), the chain obtained with the invariant law as initial measure is geometrically β ‐mixing. The proof of the theorem is complete.

The GARCH(1, 1) Case

Let us consider the GARCH(1, 1) model

3.22 equation

where ω > 0, α ≥ 0, β ≥ 0 and the sequence (η t ) is as in the previous section. In this case (σ t ) is Markovian, but t ) is not Markovian when β > 0. The following result extends Theorem 3.3.

Theorem 3.4 is of interest because it provides a proof of strict stationarity which is completely different from that of Theorem 2.8. A slightly more restrictive assumption on the law of η t has been required, but the result obtained in Theorem 3.4 is stronger.

The ARCH(q) Case

The approach developed in the case q = 1 does not extend trivially to the general case because t ) and (σ t ) lose their Markov property when p > 1 or q > 1. Consider the model

3.25 equation

where ω > 0, α i  ≥ 0, i = 1, …, q , and (η t ) is defined as in the previous section. We will once again use the Markov representation

3.26 equation

where

equation

Recall that γ denotes the top Lyapunov exponent of the sequence {A t , t ∈ ℤ}.

3.3 Bibliographical Notes

A major reference on ergodicity and mixing of general Markov chains is Meyn and Tweedie (1996). For a more succinct presentation, see Chan (1990), Tjøstheim (1990), and Tweedie (2001). For survey papers on mixing conditions, see Bradley (1986, 2005). We also mention the book by Doukhan (1994) which proposes definitions and examples of other types of mixing, as well as numerous limit theorems.

For vectorial representations of the form ( 3.26), the Feller, aperiodicity and irreducibility properties were established by Cline and Pu (1998, Theorem 2.2), under assumptions on the error distribution and on the regularity of the transitions.

The geometric ergodicity and mixing properties of the GARCH(p, q) processes were established in the Ph.D. thesis of Boussama (1998), using results of Mokkadem (1990) on polynomial processes. The proofs use concepts of algebraic geometry to determine a subspace of the states on which the chain is irreducible. For the GARCH(1, 1) and ARCH(q) models we did not need such sophisticated notions. The proofs given here are close to those given in Francq and Zakoïan (2006a), which considers more general GARCH(1, 1) models. Mixing properties were obtained by Carrasco and Chen (2002) for various GARCH‐type models under stronger conditions than the strict stationarity (for example, α + β < 1 for a standard GARCH(1, 1); see their Table 1). Meitz and Saikkonen (2008a,b) showed mixing properties under mild moment assumptions for a general class of first‐order Markov models, and applied their results to the GARCH(1, 1).

The mixing properties of ARCH(∞) models are studied by Fryzlewicz and Rao (2011). They develop a method for establishing geometric ergodicity which, contrary to the approach of this chapter, does not rely on the Markov chain theory. Other approaches, for instance developed by Ango Nze and Doukhan (2004) and Hörmann (2008), aim to establish probability properties (different from mixing) of GARCH‐type sequences, which can be used to establish central limit theorems.

3.4 Exercises

3.1 (Irreducibility Condition for an AR(1) Process)

Given a sequence ( t ) t ∈ ℕ of iid centred variables of law P which is absolutely continuous with respect to the Lebesgue measure λ on , let (X t ) t ∈ ℕ be the AR(1) process defined by

equation

where θ ∈ ℝ.

  1. Show that if P has a positive density over , then (X t ) constitutes a λ ‐irreducible chain.
  2. Show that if the density of t is not positive over all , the existence of an irreducibility measure is not guaranteed.
  1. 3.2 Equivalence between Stationarity and Invariance of the Initial Measure Show the equivalence (3.3).
  2. 3.3 (Invariance of the Limit Law) Show that if π is a probability such that for all B , μ (X t  ∈ B) → π(B) when t → ∞, then π is invariant.
  3. 3.4 (Small Sets for AR(1)) For the AR(1) model of Exercise 3.1, show directly that if the density f of the error term is positive everywhere, then the compacts of the form [−c, c], c > 0, are small sets.
  4. 3.5 (From Feigin and Tweedie 1985) For the bilinear model
    equation

    where ( t ) is as in Exercise 3.1(a), show that if

    equation

    then there exists a unique strictly stationary solution and this solution is geometrically ergodic.

  5. 3.6 (Lower Bound for the Empirical Mean of the Pt(x0, A)) Show inequality (3.12).
  6. 3.7 (Invariant Probability) Show the invariance of the probability π satisfying condition ( 3.13).

    Hints: (i) For a function g which is continuous and positive (but not necessarily with compact support), this equality becomes

    equation

    (see Meyn and Tweedie 1996, Lemma D.5.5).

    (ii) For all σ ‐finite measures μ on (ℝ, ℬ(ℝ)) we have

    equation

    (see Meyn and Tweedie 1996, Theorem D.3.2).

  7. 3.8 (Mixing of the ARCH(1) Model for an Asymmetric Density) Show that Theorem 3.3 remains true when Assumption A is replaced by the following:

    The law P η is absolutely continuous, with density f , with respect to λ . There exists τ > 0 such that

    equation

    where images and images .

  8. 3.9 (A Result on Decreasing Sequences) Show that if u n is a decreasing sequence of positive real numbers such that images , we have images . Show that this result applies to the proof of Corollary A.3 in Appendix A.
  9. 3.10 (Complements to the Proof of Corollary A.3) Complete the proof of Corollary A.3 by showing that the term d 4 is uniformly bounded in t , h and k .
  10. 3.11 (Non‐mixing Chain) Consider the non‐mixing Markov chain defined in Example A.3. Which of the assumptions (i)–(iii) in Theorem 3.1 does the chain satisfy and which does it not satisfy?

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.200.220