Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4
INGARCH Models for Count Time Series

The models discussed in Chapter 3 used types of thinning operations to transfer the ARMA model to the count data case. Another popular approach for modeling such stationary processes of counts are the INGARCH models, the definition of which is related to linear regression models (also see Section 5.1). Despite their controversial name, these models are particularly attractive for overdispersed counts with an ARMA-like autocorrelation structure. Results concerning the basic model with a conditional Poisson distribution are presented, but generalizations with, for example, a binomial or negative binomial conditional distribution are also considered.

4.1 Poisson Autoregression

Due to the multiplication problem discussed in Section 2.1 , the ARMA models of Definition B.3.5 are not applicable to the count data case. The models presented in Chapters 2 and 3 circumvented this problem by replacing the multiplications with a type of thinning operation, that ensures that these modified model recursions always produce integer values. The INGARCH models to be presented in this chapter use another solution to the multiplication problem: a linear regression of the conditional means $c04-math-001$ . To construct an AR(1)-like model, for instance, the AR(1) recursion $c04-math-002$ is transferred to the level of conditional means as $c04-math-003$ . Then the count at time $c04-math-004$ is generated by using, say, a conditional Poisson distribution – that is, $c04-math-005$ – thus guaranteeing that the outcomes are always integer values. This approach is not only related to linear regression; it also shares analogies with the definition of an ARCH(1) model (Definition B.4.1.1 ), where the autoregression is defined at the level of conditional variances: $c04-math-006$ . Such ARCH models are extended beyond pure autoregression by also including past conditional variances in the model recursion; see the GARCH equation (B.20). For example, the GARCH( $c04-math-007$ ) model is determined by $c04-math-008$ . Picking up this idea, an INGARCH( $c04-math-009$ ) model is defined by also including a feedback term, now with respect to the previous conditional mean: $c04-math-010$ .

The full INGARCH model was introduced by Rydberg & Shephard (2000), Heinen (2003) and Ferland et al. (2006). The name indicates that, as mentioned before, this model can be understood as an integer-valued counterpart to the conventional GARCH model, but also see the discussion in Remark 4.1.2 below. Conditioned on the past observations, the INGARCH model assumes an ARMA-like recursion for the conditional mean. Depending on the choice of the conditional distribution family, different INGARCH models are obtained. The basic INGARCH model, which is discussed in this section, assumes a conditional Poisson distribution.

If $c04-math-023$ , the model of Definition 4.1.1 is referred to as the INARCH(p) model.

Remark 4.1.2 (Terminology)

There is a lot of confusion in the literature about how to refer to the models given through Definition 4.1.1. In Rydberg & Shephard (2000), they are referred to as BIN models, in Heinen (2003) as ACP models (autoregressive conditional Poisson), and in Fokianos et al. (2009) as (linear) Poisson autoregressive models. The name INGARCH seems to be due to Ferland et al. (2006), and it is motivated by the analogy between condition (ii) in Definition 4.1.1 and the GARCH equation (B.20); see the initial discussion. Certainly, condition (ii) refers to conditional means, while (B.20) refers to conditional variances, which seems to be the main reason why some authors refuse to use the INGARCH terminology. On the other hand, for the particular Poisson INGARCH model, condition (ii) also applies to the conditional variances in view of the equidispersion property of the conditional Poisson distribution. Although the analogy between GARCH and INGARCH models is far from being perfect, we shall use the name INGARCH models in the sequel, as this name seems to be more often used in the literature than any of its competitors.

Although having the equidispersed Poisson distribution as a conditional distribution, the INGARCH model is well suited for overdispersed counts, since it satisfies

provided that these moments exist. In fact, Ferland et al. (2006) showed that for $c04-math-024$ , the INGARCH process exists and is strictly stationary, with finite first- and second-order moments. In this case, the mean equals

4.1

and variance and autocovariances are determined by a set of Yule–Walker-like equations (Weiß, 2009c):

4.2

where $c04-math-027$ and $c04-math-028$ . Note the analogy to equations (B.18) for the conventional ARMA model (Appendix B.3 ), and note the difference to the ARCH case (Appendix B.4.1 ), where the non-squared observations are uncorrelated.

Remark 4.1.3 (INGARCH vs. INARMA)

Equation 4.2 shows that, despite their name, the INGARCH models also form an integer-valued counterpart to the ARMA models, just like the INARMA models discussed in Section 3.1 . So one may certainly ask about the possible advantages and disadvantages of these two “competitors”. While the INARMA approach allows for counterparts to pure AR and MA models (note that a pure MA-like model is not included in the INGARCH family), a reasonable definition of a full ARMA-like model is not that obvious but easily created within the INGARCH framework; also see Example 4.1.4. Even for the purely autoregressive case, where both approaches can be used, the generated sample paths will often differ from each other in their structure; see Remark 4.1.7 for illustration. As we shall also later see, analytical expressions for marginal properties are more difficult in the INGARCH than in the INARMA case, while the INGARCH model (by definition) has simple conditional distributions. The latter is also useful in the context of conditional ML estimation, because the likelihood function factorizes to

with the conditional probabilities $c04-math-029$ stemming from the Poisson distribution $c04-math-030$ . The computation of such conditional probabilities is more demanding for INARMA models; see (3.12) for the INAR(2) model as an example. On the plus side of INARMA models, in contrast, their interpretability has to be noted. This may allow for a deeper understanding of the data-generating process in some applications. Further pros and cons could be listed here, but the present consideration already makes clear that a general recommendation of one or other of these approaches cannot be given.

Further results concerning likelihood estimation, especially on the asymptotic properties of the resulting ML estimators, are provided by Ferland et al. (2006), Fokianos et al. (2009) and Cui & Wu (2016).

Example 4.1.4 (INGARCH(1, 1) model)

The particular case of the stationary INGARCH(1, 1) model, where part (ii) of Definition 4.1.1 simplifies to $c04-math-031$ , was further investigated by Heinen (2003), Ferland et al. (2006) and Fokianos et al. (2009), among others. It was shown that all moments exist, where the variance equals

and the ACF is given by

also see (4.2). Furthermore, Theorem 3.1 in Neumann (2011) implies that such a process is $c04-math-032$ -mixing with geometrically decreasing weights (see Definition B.1.5 ).

Fokianos (2011) emphasized that the INGARCH(1, 1) process, although defined using the hidden conditional means $c04-math-033$ (feedback mechanism), is observation-driven in the sense of Cox (1981); that is, the serial dependence can be explained by past observations (for a parameter-driven process, in contrast, serial dependence is caused by a latent process; see also Remark 5.2.1 below). In particular, it follows that

that is, the current observation is influenced by all past observations, but with a weight decreasing exponentially with increasing time lag $c04-math-034$ . So the INGARCH(1, 1) model offers a parsimoniously parametrized way of accounting for a long memory.

Illustartion of Ericsson stocks: (a) transactions counts, (b) sample autocorrelation, (c) marginal frequencies. — **Figure 4.1** Ericsson stocks: (a) transactions counts, (b) sample autocorrelation, (c) marginal frequencies. See Example 4.1.5.

Example 4.1.5 (Transactions counts)

We analyze a part of a dataset that was originally published by Brännäs & Quoreshi (2010). For the working days between 2 and 22 July 2002, it provides the number of transactions of the Ericsson B stock per minute between 9:35 and 17:14. As we shall see below, the data are characterized by a slowly decaying SACF (long memory). In Brännäs & Quoreshi (2010), the data are modeled by INMA(q) models (Section 3.1) with a high model order q. But parts of the data have also been modeled by the more parsimoniously parametrized INGARCH(1, 1) model; see Fokianos et al. (2009), Zhu (2012c), Christou & Fokianos (2014) and Davis & Liu (2016), among others. We shall follow this latter approach and restrict our analyses to the counts observed on 2 July 2002, which constitute a time series of length $c04-math-035$ . A plot of these data is shown in Figure 4.1a.

The SACF in Figure 4.1b takes moderate but slowly decaying values, and the SPACF, showing several significant values, indicates that a purely autoregressive model will not be appropriate for the data. Therefore, see the discussion in Example 4.1.4, it is indeed reasonable to try to fit the INGARCH(1, 1) model to the data. The marginal distribution of the data, (Figure 4.1c), has mean $c04-math-036$ and shows a strong degree of overdispersion, $c04-math-037$ .

As the initial step of parameter estimation, we compute moment estimates for the parameters $c04-math-038$ , in the following way: the quotient $c04-math-039$ estimates $c04-math-040$ ( $c04-math-041$ , $c04-math-042$ ), which is used to compute $c04-math-043$ from (4.1). The formula for $c04-math-044$ leads to a quadratic equation in $c04-math-045$ (given the value for $c04-math-046$ ), which finally results in $c04-math-047$ and $c04-math-048$ . These estimates can now be used as initial values for CML estimation; see also Remark 4.1.3. This, however, is not a trivial task, because the conditioning requires not only $c04-math-049$ , but also the value $c04-math-050$ of the initial conditional mean $c04-math-051$ :

One solution is to specify $c04-math-052$ as, say, $c04-math-053$ or $c04-math-054$ (Fokianos et al., 2009), but this choice turns out to have a significant effect on the resulting CML estimates:

if using $c04-math-055$ , while $c04-math-056$ leads to

Therefore, we shall follow the suggestion of Ferland et al. (2006) here and treat $c04-math-057$ as a further parameter during estimation. We obtain the figures shown in Table 4.1 (standard errors in parentheses), which are quite close to the estimates obtained by initializing with $c04-math-058$ .

Table 4.1 Transactions counts: CML estimates for Poisson INGARCH(1,1) model, together with $c04-math-059$

$c04-math-060$	$c04-math-061$	$c04-math-062$	$c04-math-063$	$c04-math-064$
0.292	0.832	0.139	12.058	$c04-math-065$
(0.100)	(0.023)	(0.018)

The mean of the fitted model ( $c04-math-066$ ) is close to the observed one, and also its ACF is slowly decaying ( $c04-math-067$ ). Consequently, the SACF of the Pearson residuals (computed using $c04-math-068$ as the initial conditional mean) indicates an adequate model. However, the dispersion index of the fitted model ( $c04-math-069$ ) is much too low, which goes along with the Pearson residuals having a variance ( $c04-math-070$ ) clearly larger than 1, and with the strongly U-shaped PIT histogram shown in Figure 4.2a. So while the Poisson INGARCH(1, 1) model is able to describe the observed autocorrelation structure, it cannot explain the strong volatility of the data. Therefore, we shall pick up the suggestion by Zhu (2012c) and use modified types of INGARCH models with a conditional non-Poisson distribution; see Example 4.2.4 below.

Illustartion of PIT histograms based on (a) fitted Poisson INGARCH(1, 1) and (b) GP-INGARCH(1,1) model. — **Figure 4.2** PIT histograms based on (a) fitted Poisson INGARCH(1, 1) and (b) GP-INGARCH(1, 1) model. See Examples 4.1.5 and 4.2.4.

It should be mentioned that the slow decay of the SACF of the transactions counts from Example 4.1.5 is not necessarily caused by a long memory, but might also be explained by, for example, change points in the process (Kirch & Kamgaing, 2016, 10.5.2.1). Generally, it is well known that transactions data often exhibit an intraday pattern because of higher trading activity at the beginning and at the end of a trading day; see, for example, Wood et al. (1985). But here, for illustration, we continue with the INGARCH(1, 1) modeling.

An important subfamily of the INGARCH models from Definition 4.1.1 is the class of purely autoregressive (Poisson) INARCH(p) models, where $c04-math-071$ and

4.3

Such INARCH models were discussed by Rydberg & Shephard (2000) and Weiß (2009c) in some detail. The INARCH(p) model constitutes a pth-order Markov model, thus being a competitor to the DL-INAR(p) model (3.5). The Poisson INARCH(p) model can also be understood as a particular GINAR(p) model, see the discussion in Section 3.2 . It has simple Poisson transition probabilities,

4.4

which is attractive for CML estimation according to (B.6). It also has a linear conditional mean and variance, both given by $c04-math-074$ (also see (3.8) for the DL-INAR(p) model). In particular, equations (4.2) simplify to

4.5

that is, we have the typical AR(p) autocorrelation structure (see (B.13)). As a consequence, the model order p can be identified by inspecting the (S)PACF.

Comparing with the discussion in Section 3.1 , it becomes obvious that the INGARCH approach is easier to use when handling a higher-order ARMA-like autocorrelation structure than the INARMA approach; see also Remark 4.1.3. On the other hand, closed-form expressions for the stationary marginal distribution or for $c04-math-076$ -step ahead forecasting distributions are difficult to find, even in the simplest case of an INARCH(1) model.

Example 4.1.6 (INARCH(1) model)

The INARCH(1) model constitutes a counterpart to the INAR(1) model discussed in Section 2.1 , and it is a boundary case of the INGARCH(1, 1) model discussed in Example 4.1.4. Denoting its model parameters by $c04-math-077$ and $c04-math-078$ , the INARCH(1) model requires $c04-math-079$ to be conditionally Poisson-distributed in the following way:

Hence, the transition probabilities, as required for likelihood computation (see (B.6)), are simply given by

4.6

The conditional variance and mean coincide, and they are both linear in the previous observation, given by $c04-math-081$ . The latter implies that the INARCH(1) model belongs to the class of CLAR(1) models (Grunwald et al., 2000).

An INARCH(1) process is a stationary, ergodic and $c04-math-082$ -mixing Markov chain (Neumann, 2011). All moments of an INARCH(1) process exist (Ferland et al., 2006). The marginal cumulants can be determined according to the recursive scheme provided by Weiß (2009b):

where the coefficients $c04-math-083$ are the Stirling numbers of the first kind, given by

for $c04-math-084$ and $c04-math-085$ . In particular, the marginal mean and variance of an INARCH(1) process are given by

The autocorrelation function equals $c04-math-086$ as in the standard AR(1) case, and closed-form expressions for the joint (central) moments and cumulants up to order 4 are provided by Weiß (2010a).

While the 1-step-ahead conditional properties of the INARCH(1) model are very simple, there is no closed-form formula for the stationary marginal distribution, or for the $c04-math-087$ -step-ahead conditional properties with $c04-math-088$ . To obtain these, at least numerically, the MC approximation of Remarks 2.1.3.4 and 2.6.3 has to be adopted.

Remark 4.1.7 (Comparison to INAR(1) model)

At first glance, the Poisson INAR(1) and INARCH(1) model are very similar; choosing $c04-math-094$ and a unique value of $c04-math-095$ , they have the same marginal mean and the same ACF (see Section 2.1.2). But while the Poisson INAR(1) model is unconditionally equidispersed, the INARCH(1) model shows increasing overdispersion with increasing $c04-math-096$ . Also the whole sample paths generated by these models differ more and more from each other with increasing $c04-math-097$ . This can be seen by comparing Figure 2.5 b (Example 2.1.2.1 ) with Figure 4.3, where both sample paths refer to the marginal mean $c04-math-098$ and the strong autocorrelation level $c04-math-099$ . In contrast to Figure 2.5b, the INARCH(1) process leads to long runs only for the value 0, while we have vivid fluctuations otherwise (note that the linear term $c04-math-100$ of the conditional variance is much larger than 0 except for $c04-math-101$ ). Also more extreme counts are observed in the INARCH(1) case, which can be explained to some extent by the strong level of overdispersion, $c04-math-102$ .

Figure 4.4 highlights the difference between the conditional variances of the Poisson INAR(1) and INARCH(1) models, given by $c04-math-103$ (see (2.6)) and $c04-math-104$ (see Example 4.1.6), respectively. This difference is much larger for $c04-math-105$ than for $c04-math-106$ . For the Poisson INAR(1) model, the conditional variance shows nearly constant and low values for $c04-math-107$ explaining the overall tendency to produce long runs (see Example 2.1.2.1 ). In contrast, it quickly tends to large values for the Poisson INARCH(1) model, so runs are observed mainly for zero, with vivid fluctuations otherwise. Further information about the relation between Poisson INAR(1) and INARCH(1) models can be found in Weiß (2015a).

Figure 4.4 Conditional variances of Poisson INAR(1) and INARCH(1) process with $c04-math-091$ and (a) $c04-math-092$ , (b) $c04-math-093$ ; see Remark 4.1.7.

Illustartion of Strikes: (a) counts, (b) sample autocorrelation, (c) marginal frequencies. — **Figure 4.5** Strikes: (a) counts, (b) sample autocorrelation, (c) marginal frequencies; see Example 4.1.8.

Example 4.1.8 (Strike counts)

We analyze the monthly number of work stoppages (strikes and lock-outs) of 1000 or more workers, as published by the US Bureau of Labor Statistics.¹ We restrict ourselves to the period 1994–2002, leading to a time series of length $c04-math-108$ as was analyzed by Jung et al. (2005) and Weiß (2010b), among others. The plot in Figure 4.5a shows similar fluctuations as in the plotted INARCH(1) sample path in Figure 4.3. An analysis of the SPACF (the SACF is shown in Figure 4.5b) indicates an AR(1)-like autocorrelation structure, with $c04-math-109$ . The marginal distribution has mean $c04-math-110$ and is significantly overdispersed ( $c04-math-111$ ) according to the test (2.14); also see the plot in Figure 4.5c. So altogether, the INARCH(1) model appears to be a reasonable candidate for the data, but we also consider the Poisson and NB-INAR(1) model as well as the NB-RCINAR(1) model as further candidates (Section 2.5 and Example 3.2.1 ).

Figure 4.3 Simulated sample path of Poisson INARCH(1) process with $c04-math-089$ and $c04-math-090$ ; see Remark 4.1.7.

The models' parameters are estimated by a full maximum likelihood approach; see Table 4.2 for a summary of the results. The (equidispersed) Poisson INAR(1) model not only performs worst in terms of AIC and BIC, but also in respect of its Pearson residuals (variance $c04-math-116$ ) and its U-shaped PIT histogram (see Figure 4.6a; only $c04-math-117$ intervals are used since the time series is rather short). So models that are able to reproduce the overdispersion are required. The remaining INARCH(1), NB-INAR(1) and NB-RCINAR(1) model are more adequate in these respects, with marginal dispersion indices of 1.408, 1.522 and 1.704, respectively, with the Pearson residuals having variances of 1.026, 0.998 and 1.000, respectively, and with the PIT histograms being close to uniformity (see the INARCH(1) PIT histogram in Figure 4.6b for illustration). The AIC and BIC point to the parsimoniously parametrized INARCH(1) model as being preferred among the candidate models.

Figure 4.6 PIT histograms based on fitted (a) Poisson INAR(1) and (b) INARCH(1) model; see Example 4.1.8.

The $c04-math-120$ -step-ahead conditional distributions (conditioned on $c04-math-121$ ) and the stationary marginal distribution (corresponding to $c04-math-122$ due to the ergodicity) for the fitted INARCH(1) model are computed by the MC approximation of Remarks 2.1.3.4 and 2.6.3 . The convergence of the conditional distributions is illustrated by Figure 4.7 (increasing darkness for increasing probability value), where the gray colors in the last column refer to the marginaldistribution.

Table 4.2 Strike counts: ML estimates, AIC and BIC for different models

Model	Parameter			AIC	BIC
	1	2	3
Poisson INARCH(1)	1.723	0.643		470	475
$c04-math-112$	(0.382)	(0.080)
Poisson INAR(1)	2.423	0.503		480	485
$c04-math-113$	(0.297)	(0.056)
Negative-binomial INAR(1)	3.473	0.613	0.548	475	484
$c04-math-114$	(2.065)	(0.129)	(0.057)
Negative-binomial RCINAR(1)	9.331	0.657	0.592	473	481
$c04-math-115$	(4.259)	(0.105)	(0.062)

Figures in parentheses are standard errors.

Figure 4.7 $c04-math-118$ -step-ahead conditional distributions (conditioned on $c04-math-119$ ) and stationary marginal distribution; see Example 4.1.8.

Remark 4.1.9 (Further extensions)

As for the thinning-based models (see Remark 3.1.7 ), the basic INGARCH approach can be extended in several ways. Extensions of the INARCH(1) model to account for trend and seasonality are discussed by Held et al. (2005 2006). Conditional linear models as before, but with non-Poisson conditional distributions, are presented in Section 4.2, but the estimating functions approach described by Thavaneswaran & Ravishanker (2016) should also be mentioned in this context. Models where the linear recursion in Definition 4.1.1(ii) is replaced by a log-linear one are discussed by Fokianos & Tjøstheim (2011), while Fokianos & Tjøstheim (2012) consider more general non-linear autoregressions. Some of these models are also briefly discussed in Section 5.1 in the context of regression models.

Keeping the conditionally linear structure but allowing for the inclusion of covariate information, Agosto et al. (2016) proposed extending the INGARCH recursion in part (ii) of Definition 4.1.1 to

where the response function $c04-math-123$ takes only non-negative real values (also see Section 5.1). For the case where the covariates $c04-math-124$ are not deterministic but stem from a Markov chain, Agosto et al. (2016) derive conditions for the existence of a stationary solution and analyze the asymptotic properties of the ML estimator.

Finally, a self-exciting threshold (SET) extension of the Poisson INGARCH model has been proposed by Wang et al. (2014).

4.2 Further Types of INGARCH Models

The standard INGARCH model with its conditional Poisson distribution exhibits unconditional overdispersion, but the degree of overdispersion is determined by the actual autocorrelation structure (say, $c04-math-125$ for the Poisson INARCH(1) model from Example 4.1.6). As a consequence, this model was not able to describe the strong volatility of the transactions counts in Example 4.1.5. To overcome this limitation, Xu et al. (2012) proposed the family of dispersed INARCH models (DINARCH), which again assume a linear relationship for the conditional mean (see (4.3)), but include an additional (constant) scaling factor $c04-math-126$ for the conditional variance:

4.7

So the characteristic feature is a time-invariant conditional dispersion index, being equal to $c04-math-128$ . Obviously, the Poisson INARCH model is an instance of the DINARCH model with $c04-math-129$ .

For the case $c04-math-130$ (see Example 4.1.6), the unconditional mean and variance are given by

4.8

that is, $c04-math-132$ allows control of the (unconditional) degree of dispersion independently of $c04-math-133$ (Xu et al., 2012).

Example 4.2.1 (NB-INGARCH models)

As a particular instance of the DINARCH family, Xu et al. (2012) proposed a conditional negative binomial model (see Example A.1.4 ), where $c04-math-134$ given $c04-math-135$ follows the $c04-math-136$ distribution with $c04-math-137$ and with the conditional mean $c04-math-138$ satisfying (4.3). So the conditional dispersion parameter $c04-math-139$ is given by $c04-math-140$ and fixed over time, while the NB parameter $c04-math-141$ varies according to the observed past.

A different type of NB-INARCH model (even a full NB-INGARCH model) is proposed by Zhu (2011), who assumes the parameter $c04-math-142$ of the conditional NB distribution to be fixed while $c04-math-143$ varies with time: $c04-math-144$ given $c04-math-145$ follows the $c04-math-146$ distribution with $c04-math-147$ , that is,

Note that in the original parametrization of Zhu (2011), the parameters $c04-math-148$ do not directly refer to the conditional mean but to the odds $c04-math-149$ . But to keep it consistent with Definition 4.1.1, the above parametrization is preferred here.

For Zhu's NB-INGARCH model, the conditional dispersion index varies with time according to $c04-math-150$ , so this type of NB-INGARCH model differs from the DINARCH approach (4.7).

Comparing Zhu's NB-INGARCH(1, 1) model with the standard INGARCH(1, 1) model from Example 4.1.4, we have the same unconditional mean and ACF, but the unconditional variance now equals (Zhu, 2011):

The variance corresponding to the INARCH(1) case (Example 4.1.6) follows by setting $c04-math-151$ .

A brief overview of the different INGARCH models is provided in Table 4.3.

**Table 4.3** Specific INGARCH models, where the conditional mean $c04-math-152$ satisfies $c04-math-153$

**Table 4.3** Specific INGARCH models, where the conditional mean $c04-math-152$ satisfies $c04-math-153$

Both types of NB-INGARCH model (as well as the GP-INGARCH model to be discussed in the next example) are instances of the CP-INGARCH model (see Example A.1.2 about the compound Poisson distribution) introduced by Gonçalves et al. (2015). It is given by

4.9

where $c04-math-168$ denotes the pgf of the compounding distribution (assumed to be normalized to $c04-math-169$ for uniqueness), which is generally allowed to depend on time $c04-math-170$ through past observations. If $c04-math-171$ is constant in time, then the above condition $c04-math-172$ still guarantees the existence of a strictly stationary and ergodic solution for the CP-INGARCH model, having finite first- and second-order moments (Gonçalves et al., 2015). Further restricting to the case $c04-math-173$ , the resulting CP-INARCH model becomes an instance of the DINARCH model, where $c04-math-174$ .

Note that Zhu (2012a) also allows $c04-math-182$ to become negative (conditional underdispersion), but this case has to be considered with caution in view of the problems discussed below (Example A.1.6 ).

The INGARCH approach also allows generation of zero inflation.

Example 4.2.4 (Transactions counts)

Let us continue Example 4.1.5 about the transactions counts. Since zeros are observed quite seldom, the ZIP-INGARCH model is not plausible for the data, but both types of NB- and the GP-INGARCH(1, 1) model (Examples 4.2.1 and 4.2.2) are reasonable candidate models (also see Zhu (2012c)). CML estimation is done by analogy to Example 4.1.5, and the results are summarized in Table 4.4.

Table 4.4 Transactions counts: CML estimates for different models, together with $c04-math-186$

Model	Parameter				$c04-math-187$	$c04-math-188$
	1	2	3	4
Poi. INGARCH(1, 1)	0.292	0.832	0.139		12.058	$c04-math-189$
$c04-math-190$	(0.100)	(0.023)	(0.018)
NB^Xu-INGARCH(1, 1)	0.295	0.836	0.134	0.444	12.939	$c04-math-191$
$c04-math-192$	(0.145)	(0.034)	(0.027)	(0.030)
NB^Zhu-INGARCH(1, 1)	0.270	0.845	0.127	7.861	12.038	$c04-math-193$
$c04-math-194$	(0.142)	(0.034)	(0.026)	(0.959)
GP-INGARCH(1, 1)	0.293	0.838	0.132	0.338	13.099	$c04-math-195$
$c04-math-196$	(0.144)	(0.034)	(0.026)	(0.023)

Figures in parentheses are standard errors.

Note that the estimates for $c04-math-197$ are very similar for all models. Furthermore, any of the INGARCH(1, 1) models with additional dispersion leads to a considerable improvement compared to the Poisson INGARCH(1, 1) model from Example 4.1.5. For instance, the dispersion indices of these models are 2.931, 3.029 and 2.961, respectively, and the variances of their Pearson residuals are 1.035, 1.049 and 1.022, respectively. Furthermore, all PIT histograms are reasonably close to uniformity; see Figure 4.2b as an example. A decision on one of these models is difficult; the maximized log-likelihood suggests the GP-INGARCH(1, 1) model.

Finally, let us have a look at the case of counts having the finite range $c04-math-198$ with some fixed upper limit $c04-math-199$ (see also the discussion in Section 3.3). None of the above models can be used in such a situation, since the respective conditional distribution has an unbounded range.

Example 4.2.5 (Binomial INARCH(1) model)

A version of the INARCH(1) model suitable for finite-valued counts was proposed by Weiß & Pollett (2014). For their binomial INARCH(1) model, they assume

4.10

where $c04-math-201$ has to be satisfied. Analogous to the binomial AR(1) model from Section 3.3 , this gives a stationary, ergodic and $c04-math-202$ -mixing Markov chain, but now with simple binomial 1-step-ahead transition probabilities:

4.11

The conditional mean and variance are obtained from the conditional binomial distribution as

4.12

that is, in contrast to (3.22) for the binomial AR(1) model, the conditional variance is now a quadratic function in $c04-math-205$ . Unconditional mean and variance are given by (Weiß & Pollett, 2014):

4.13

Note that the binomial index of dispersion $c04-math-207$ (see the definition in (2.3)) can only take values in $c04-math-208$ . So, analogous to the case of the Poisson INARCH(1) model from Example 4.1.6, but in contrast to the binomial AR(1) model, we observe extra-binomial variation, the degree of which is determined through the autocorrelation parameter $c04-math-209$ . The autocorrelation function is given by $c04-math-210$ . Note that $c04-math-211$ and hence $c04-math-212$ might also take negative values, which is in contrast to the case of the INARCH(1) models. Another difference to the Poisson INARCH(1) model from Example 4.1.6 is the fact that the conditional variance in (4.13) is not a linear but a quadratic function in $c04-math-213$ .

As for the other INARCH(1) models, there are no closed-form expressions available for the stationary marginal distribution or the $c04-math-214$ -step-ahead conditional distributions with $c04-math-215$ . But due to the finite range, and in complete analogy to the case of the beta-binomial AR(1) model (see the discussion in Section 3.3), these can be exactly computed numerically by utilizing the Markov property; see Appendix B.2.1 for details.

Example 4.2.6 (Hantavirus infections)

The Robert-Koch-Institut (2016) collects data about cases of notifiable diseases in Germany. With SurvStat@RKI 2.0, Robert-Koch-Institut (2016) offers a web interface that allows retrieval of data from their disease database. Here, we shall follow an application presented by Weiß & Pollett (2014) and analyze some data about infections by the hantavirus, which is mainly carried by rodents.^b According to Heyman et al. (2009), hemorrhagic fever with renal syndrome, caused by the hantavirus and with a mortality rate of up to 12%, affects tens of thousands of individuals each year in Europe, and numbers of human cases are rising, perhaps because of mild winters. As an indicator of the regional spread of hantavirus infections, we consider the weekly number $c04-math-216$ of territorial units (out of $c04-math-217$ territorial units according to the “NUTS Level 2”) with at least one new case of a hantavirus infection. As in Weiß & Pollett (2014), we restrict ourselves to the 2011 data ( $c04-math-218$ weeks). Note, however, that we consider updated data (data status at 7 January 2016: two of the counts have been increased by 1 in the meantime); that is, the later results are slightly different from the ones reported by Weiß & Pollett (2014).

The plot of the time series $c04-math-222$ in Figure 4.8a and the pmf plot in Figure 4.8c show that the counts do not exhaust the full range $c04-math-223$ : there are at most 11 territorial units in a week with new cases of hantavirus infections. The mean equals $c04-math-224$ , and the dispersion test (3.30) uncovers a significant degree of extra-binomial variation: $c04-math-225$ with P value $c04-math-226$ . The SACF in Figure 4.8b exhibits a medium autocorrelation level, with $c04-math-227$ . Although the SPACF also shows a significant value at lag 2, we shall first see if an AR(1)-like model suffices to describe the data. So we fit the binomial INARCH(1) model from Example 4.2.5 to the data, and the (beta-)binomial AR(1) model discussed in Section 3.3 for comparison. Full ML estimates and the corresponding information criteria are summarized in Table 4.5.

c04-math-220 — **Table 4.5** Hanta counts: ML estimates, AIC and BIC for different models

Figures in parentheses are standard errors.

Figure 4.8 Hantavirus reports: (a) counts, (b) sample autocorrelation, (c) marginal frequencies. See Example 4.2.6.

The binomial and the beta-binomial AR(1) model not only perform worst in terms of AIC and BIC; an analysis of the respective Pearson residuals and the PIT histogram shows that these models are not adequate for the data. In contrast, the Pearson residuals of the fitted binomial INARCH(1) model (variance $c04-math-228$ , SACF in Figure 4.9a) and its PIT histogram in Figure 4.9b show that this model does rather well. In particular, the residuals' SACF in Figure 4.9a does not suggest a need to use a higher-order model, although the SPACF of the original time series was significant at lag 2. The marginal distribution of the fitted binomial INARCH(1) model has mean 4.297 and binomial dispersion index 2.104, both being close to the empirical values. An important difference between the three fitted models becomes clear by looking at their conditional variances; see Figure 4.10. The binomial and the beta-binomial AR(1) model show increasing variance with increasing $c04-math-229$ , whereas the binomial INARCH(1) model has its largest conditional variances in the center of the range $c04-math-230$ .

Figure 4.9 Hanta counts, see Example 4.2.6: (a) SACF of Pearson residuals and (b) PIT histogram, both based on fitted binomial INARCH(1) model.

In this context, it is worth looking back to Figure 4.8a: it seems that the counts for $c04-math-231$ , having reached a higher level, also show more variation. This phenomenon can be explained by the quadratic conditional variance (4.12); see Figure 4.10 as well as the detailed discussion in Weiß & Pollett (2014). A possible alternative for describing these data could be the SET binomial INARCH model as proposed by Möller (2016).

Figure 4.10 Hanta counts for Example 4.2.6: conditional variances of fitted models.

4.3 Multivariate INGARCH Models

While a lot of thinning-based models for multivariate counts have been proposed in the literature – see Section 3.4 for some of these models – little work has been done concerning multivariate extensions of the INGARCH model. A bivariate Poisson INGARCH(1,1) model is presented in Chapter 4 of Liu (2012); also see the works by Heinen & Rengifo (2003) and Andreassen (2013). Analogous to Definition 4.1.1, the bivariate counts $c04-math-232$ , conditioned on $c04-math-233$ , are assumed to be bivariately Poisson distributed (Example A.3.1 ) according to $c04-math-234$ , where the conditional mean $c04-math-235$ , with $c04-math-236$ for $c04-math-237$ , satisfies

4.14

where $c04-math-239$ , and where $c04-math-240$ are non-negative matrices. Liu (2012) shows that a unique stationary solution for $c04-math-241$ given by (4.14) exists if the largest absolute eigenvalue of $c04-math-242$ is smaller than 1, and if $c04-math-243$ for some $c04-math-244$ . Here, the $c04-math-245$ denotes the induced norm corresponding to the conventional vector $c04-math-246$ -norm. To guarantee ergodicity, $c04-math-247$ for some $c04-math-248$ is also required. The stationary mean of $c04-math-249$ equals $c04-math-250$ , and formulae for variance and autocovariance are provided by Heinen & Rengifo (2003). The latter work mainly concentrates on an extension of the Poisson distribution, the so-called double Poisson distribution, and it deals with the general multivariate case. In addition, to allow for more flexible cross-correlation, a copula-based approach is presented; see also Andreassen (2013). A type of multivariate INARCH(1) model (expandable by trend and seasonal component) was proposed by Held et al. (2005).

An INARCH model for bivariate counts with range $c04-math-251$ was proposed by Scotto et al. (2014). Analogous to (4.10), their bivariate binomial INARCH(1) model ( $c04-math-252$ -INARCH(1)-INARCH) assumes the bivariate counts $c04-math-253$ , conditioned on $c04-math-254$ , to be $c04-math-255$ -distributed (Example A.3.5 ) as

4.15

where $c04-math-257$ for $c04-math-258$ , and where

Scotto et al. (2014) showed that the $c04-math-259$ -INARCH(1) process constitutes a stationary, ergodic and $c04-math-260$ -mixing Markov chain with the transition probabilities being determined by (4.15), where the components $c04-math-261$ for $c04-math-262$ are just univariate binomial INARCH(1) processes with parameters $c04-math-263$ . The cross-covariance function has the form (Scotto et al., 2014):

4.16

and may take also negative values, depending on the sign of $c04-math-265$ .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Model	Conditional distribution	Conditional dispersion index
Poi. INGARCH	$c04-math-154$	1
NB^Xu-INGARCH	$c04-math-155$ with $c04-math-156$	$c04-math-157$
NB^Zhu-INGARCH	$c04-math-158$ with $c04-math-159$	$c04-math-160$
GP-INGARCH	$c04-math-161$ with $c04-math-162$	$c04-math-163$
ZIP-INGARCH	$c04-math-164$ with $c04-math-165$	$c04-math-166$

Model	Parameter			AIC	BIC
	1	2	3
Binomial AR(1)	0.112	0.539		226	230
$c04-math-219$	(0.013)	(0.070)
Beta-binomial AR(1)	0.114	0.570	0.027	221	227
$c04-math-220$	(0.017)	(0.073)	(0.015)
Binomial INARCH(1)	0.030	0.734		215	219
$c04-math-221$	(0.011)	(0.103)

Table of Contents for Chapter 4: INGARCH Models for Count Time Series

Create new playlist

Sign In

Sign Up

4.1 Poisson Autoregression

4.2 Further Types of INGARCH Models

4.3 Multivariate INGARCH Models

Table of Contents for
Chapter 4: INGARCH Models for Count Time Series