Chapter 5
Further Models for Count Time Series

The INARMA and INGARCH approaches described above have become very popular in recent years for the modeling of stationary and ARMA-like count processes. But a large number of other count time series models has also been proposed in the literature. Three of these alternatives are presented in this chapter: regression models in Section 5.1, hidden-Markov models in Section 5.2, and NDARMA models in Section 5.3.

5.1 Regression Models

A traditional approach for modeling count data (not just time series data) are regression models. The main advantage of regression models is their ability to incorporate covariate information (although also extensions of, for example, the INARMA models have been developed that include covariate information; see Remark 3.1.7). Here, we will review some of these regression models for count time series. Among others, we consider the observation-driven Markov models proposed by Zeger & Qaqish (1988) in Example 5.1.3, which had a groundbreaking effect for research on count time series similar to the work of McKenzie (1985) and Al-Osh & Alzaid (1987) on the thinning-based INAR(1) model (Section 2.2). It will become clear that the INGARCH model discussed in Chapter 4.1 can be understood as an instance of the family of count regression models. A much more detailed discussion of regression models for count time series can be found in Chapter 4 of the book by Kedem & Fokianos (2002); further recent references on this topic are provided by Fokianos (2011) and Tjøstheim (2012).

Let c05-math-001 be a count process, and let c05-math-002 be a vector-valued covariate process (which might also be deterministic). To simplify the discussion, we shall mainly consider the case of a conditional Poisson distribution (Example A.1.1), although non-Poisson distributions like the negative binomial distribution (Example A.1.4) have also been considered in the literature. The conditional mean (as the parameter of the conditional Poisson distribution) is assumed to be “linked” to a linear expression of the available information. Therefore, the considered models are commonly referred to as generalized linear models (GLMs) (Kedem & Fokianos, 2002).

Many of these Poisson regression models for count processes are conditional regression models in the sense of Fahrmeir & Tutz (2001); that is, they are defined by specifying the conditional distribution of the counts, given the available observations and covariates.

Part (i) specifies the random component of the model, while (ii) determines the systematic component (Kedem & Fokianos, 2002). Note that c05-math-017 (or c05-math-018, respectively) and the parameter range for c05-math-019 have to be chosen such that c05-math-020 always leads to a positive value, since the (conditional) mean of a count random variable is necessarily positive.1 Choosing the identity link c05-math-022, c05-math-023 becomes a linear function in c05-math-024, with the INARCH models as discussed in Chapter 4.1 being instances of such conditional regression models with identity link. More generally, models of the form

are referred to as generalized autoregressive moving-average models (GARMA models) of order c05-math-026, where c05-math-027 and c05-math-028 are functions representing the autoregressive and moving-average terms (Benjamin et al., 2003; Kedem & Fokianos, 2002). This approach not only includes the INGARCH model according to Definition 4.1.1, but many other important models, some of which are briefly presented below.

Since the canonical link function (also natural link) of the Poisson distribution is the log link c05-math-029, one often considers a log-linear Poisson model of the form

that is, where the conditional mean is determined multiplicatively as c05-math-031. Note that the logarithm is a (bijective and strictly monotonic increasing) mapping between c05-math-032 and c05-math-033; that is, the right-hand side of (5.2) will always produce a positive value, independent of the parameter range for c05-math-034.

The conditional approach according to Definition 5.1.1 assumes that the count at time c05-math-106 can be explained by the past observations and the covariate information up to time c05-math-107. For a marginal (Poisson) regression model in the sense of Fahrmeir & Tutz (2001), the past observations are without explanatory power provided that the current covariates are given. So the marginal distribution of the counts can be modeled directly. In its basic form, a marginal Poisson regression model requires c05-math-108, conditioned on c05-math-109, to be Poisson distributed according to c05-math-110, where the mean c05-math-111 satisfies

with the design vector c05-math-113 now being a function of only c05-math-114. A typical example is the seasonal log-linear model being used by Höhle & Paul (2008) for epidemic counts, defined by

where c05-math-116 with period c05-math-117.

Illustartion of Plot of the Cryptosporidiosis counts together with conditional means.

Figure 5.6 Plot of the Cryptosporidiosis counts together with conditional means; see Example 5.1.7.

An immediate extension of the above marginal models towards parameter-driven models is obtained by assuming an additional latent process – one may also assume that a part of the covariate process is unobservable – say, c05-math-163. Then the conditional means defined by c05-math-164 are modeled by the approach in (5.5).

Parameter-driven models for multivariate count processes have been proposed by Jørgensen et al. (1999) and Jung et al. (2011). The state space model by Jørgensen et al. (1999) uses a conditional Poisson distribution and assumes the latent process to be a type of gamma Markov process; the covariate information is embedded with a log-link approach. The dynamic factor model by Jung et al. (2011) generates the components of the c05-math-176-dimensional counts from conditionally independent Poisson distributions. The corresponding c05-math-177-dimensional vectors of Poisson means constitute a latent process and are determined by three latent factors (log-linear model), which themselves are assumed to be independent Gaussian AR(1) processes. More information on these and further models for multivariate count time series can be found in the survey by Karlis (2016).

5.2 Hidden-Markov Models

A very popular type of parameter-driven model for count processes is the hidden-Markov model (HMM); actually, such HMMs can be defined for any kind of range, even for categorical processes; see Section 7.3. According to Ephraim & Merhav (2002), the first paper about HMMs was the one by Baum & Petrie (1966), who referred to them as “probabilistic functions of Markov chains”; they in fact focussed on the categorical case discussed in Section 7.3. This section gives an introduction to these models (for the count data case), while a much more comprehensive treatment of HMMs is provided by the book by Zucchini & MacDonald (2009); also the survey article by Ephraim & Merhav (2002) is recommended for further reading.

HMMs assume a bivariate process c05-math-178, where the c05-math-179 are the observable random variables, whereas the c05-math-180 are the hidden states (latent states) with range c05-math-181 where c05-math-182. Note that the numbers c05-math-183 just constitute a numerical coding of the hidden states, which are assumed to be of categorical nature (possibly not even ordinal; see Chapter 6 for more details). Possible choices for the observations' range are discussed below. The (categorical) state process c05-math-184 is assumed to be a homogeneous Markov chain (Appendix B.2). Given the state process, the observation process c05-math-185 is serially independent with its pmf being solely determined through the current state c05-math-186 (in this sense, we are concerned with a “probabilistic function” of a Markov chain; see Baum & Petrie (1966)). A common graphical representation of this data-generating mechanism is shown in Figure 5.7.

Illustartion of data-generating mechanism of an HMM.

Figure 5.7 Graphical representation of the data-generating mechanism of an HMM.

Let us return to HMMs. These models are defined by two sets of parameters: one determining the distribution of the state process c05-math-204, and another concerning the conditional distribution of the observation c05-math-205 given the current state c05-math-206 (state-dependent distribution). The state process c05-math-207 is assumed to satisfy the above state equation (5.9) and, in addition, to be a homogeneous Markov chain with the state transition probabilities being given by

5.11 equation

Let c05-math-209 denote the corresponding transition matrix. The initial distribution c05-math-210 of c05-math-211 either leads to additional model parameters, or it is determined by a stationarity assumption; that is, c05-math-212, where c05-math-213 satisfies the invariance equation c05-math-214 (see (B.4)). We shall restrict ourselves to stationary HMMs here; that is, c05-math-215 for all c05-math-216 and all c05-math-217.

Concerning the observations, the observation equation (5.8) has to hold; that is,

equation

These state-dependent distributions are also assumed to be time-homogeneous, say c05-math-218 for the states c05-math-219. So c05-math-220 for all c05-math-221.

In applications, parametric distributions are assumed for the c05-math-222. For illustration, we shall mainly focus on the Poisson HMM, but any other count model could be used as well, or even different models for different states. As mentioned before, HMMs might be adapted to any kind of range for the observations (whereas the states are always categorical), for example, to continuous-valued cases like c05-math-223 or to purely categorical cases, as discussed in Section 7.3. The Poisson HMM assumes the distribution of c05-math-224, conditioned on c05-math-225, to be the Poisson distribution c05-math-226; that is,

and consequently c05-math-228. The complete set of model parameters is given by c05-math-229 and c05-math-230, where c05-math-231 has to hold for all c05-math-232.

Let us look at some stochastic properties of the resulting observation process c05-math-233 (Zucchini & MacDonald, 2009, Section 2.3). Let c05-math-234, as a function of c05-math-235, denote the diagonal matrices c05-math-236. Then the marginal pmf and the bivariate probabilities are computed as

where c05-math-238 denotes the vector of ones. To express mean c05-math-239 and variance c05-math-240, let us introduce the notation c05-math-241 with c05-math-242, and c05-math-243. Then it follows that

The autocovariance c05-math-245 equals

For the limiting behavior of c05-math-247 for c05-math-248, see Remark B.2.2.1 on the Perron–Frobenius theorem.

Next, we turn to the question of parameter estimation. A widely used approach is the Baum–Welch algorithm, which is an instance of the expectation-maximization (EM) algorithm; see Chapter 4 in Zucchini & MacDonald (2009) for a detailed description. Alternatively, a direct (numerical) maximization of the likelihood function can be performed. Provided that accurate starting values have been selected, the latter approach usually converges much faster than the Baum–Welch algorithm; see Bulla & Berzel (2008). Also, MacDonald (2014) concludes that the direct maximization of the likelihood is often advantageous. Therefore, we shall concentrate on this latter approach here.

The forward probabilities defined in the Remark 5.2.3 are not only useful in view of likelihood computation, but also for forecasting future observations. The observations' c05-math-282-step-ahead forecasting distribution, given the observations c05-math-283, is computed as (Zucchini & MacDonald, 2009, Section 5.2):

Note that these probabilities are easily updated for increasing c05-math-285 according to the recursive scheme in (5.16). Such an updating is also required if residuals are to be computed for the fitted model. While the forecast pseudo-residuals (Zucchini & MacDonald, 2009, Section 6.2.3) can be computed exactly using (5.19) with c05-math-286, the standardized Pearson residuals (Section 2.5) need to be approximated by computing c05-math-287, with c05-math-288 being sufficiently large.

In some applications, it might also be necessary to predict a future state of the HMM; in this case,

5.20 equation

should be used, where c05-math-290 is the c05-math-291th unit vector (Example A.3.3).

5.3 Discrete ARMA Models

The “new” discrete ARMA (NDARMA) models were proposed by Jacobs & Lewis (1983). They generate an ARMA-like dependence structure through some kind of random mixture. There are several ways of formulating these models, for example through a backshift mechanism, as in Jacobs & Lewis (1983), or by using Pegram's operator, as in Biswas & Song (2009). Here, we follow the approach of Weiß & Göb (2008) to give a representation close to the conventional ARMA recursion.

Note that exactly one out of c05-math-341 becomes 1; all others are equal to 0. Hence the NDARMA recursion (21) implies that each observation c05-math-342 chooses either one of the past observations c05-math-343 or one of the past (unobservable) innovations c05-math-344. Because of this mechanism, the stationary marginal distribution of c05-math-345 is identical to that of c05-math-346; that is, c05-math-347, and we always have

equation

The autocorrelations are non-negative and can be determined from the Yule–Walker equations (Jacobs & Lewis, 1983)

22 equation

where the c05-math-349 satisfy

equation

which implies c05-math-350 for c05-math-351, and c05-math-352. While these properties might suggest that the NDARMA models should be very attractive in practice for ARMA-like count processes, they have an important limitation: the sample paths generated by NDARMA processes tend to show long runs (constant segments) of a certain count value. This is illustrated by Figure 5.11, where the plotted sample path differs markedly from the corresponding INAR(1) path in Figure 2.5b and the INARCH(1) path in Figure 4.3. Since these long runs and large jumps between them are a rather uncommon pattern in real count time series, the NDARMA models are rarely used in the count data context, although we shall see in Section 7.2 that they are quite useful when considering categorical time series. An important exception is the modeling of video traffic data (Tanwir & Perros, 2014), as briefly sketched in the following example.

Illustartion of Simulated sample path of Poisson DAR(1) process.

Figure 5.11 Simulated sample path of Poisson DAR(1) process with c05-math-353 and c05-math-354.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.97.170