The INARMA and INGARCH approaches described above have become very popular in recent years for the modeling of stationary and ARMA-like count processes. But a large number of other count time series models has also been proposed in the literature. Three of these alternatives are presented in this chapter: regression models in Section 5.1, hidden-Markov models in Section 5.2, and NDARMA models in Section 5.3.
5.1 Regression Models
A traditional approach for modeling count data (not just time series data) are regression models. The main advantage of regression models is their ability to incorporate covariate information (although also extensions of, for example, the INARMA models have been developed that include covariate information; see Remark 3.1.7). Here, we will review some of these regression models for count time series. Among others, we consider the observation-driven Markov models proposed by Zeger & Qaqish (1988) in Example 5.1.3, which had a groundbreaking effect for research on count time series similar to the work of McKenzie (1985) and Al-Osh & Alzaid (1987) on the thinning-based INAR(1) model (Section 2.2). It will become clear that the INGARCH model discussed in Chapter 4.1 can be understood as an instance of the family of count regression models. A much more detailed discussion of regression models for count time series can be found in Chapter 4 of the book by Kedem & Fokianos (2002); further recent references on this topic are provided by Fokianos (2011) and Tjøstheim (2012).
Let be a count process, and let be a vector-valued covariate process (which might also be deterministic). To simplify the discussion, we shall mainly consider the case of a conditional Poisson distribution (Example A.1.1), although non-Poisson distributions like the negative binomial distribution (Example A.1.4) have also been considered in the literature. The conditional mean (as the parameter of the conditional Poisson distribution) is assumed to be “linked” to a linear expression of the available information. Therefore, the considered models are commonly referred to as generalized linear models (GLMs) (Kedem & Fokianos, 2002).
Many of these Poisson regression models for count processes are conditional regression models in the sense of Fahrmeir & Tutz (2001); that is, they are defined by specifying the conditional distribution of the counts, given the available observations and covariates.
Part (i) specifies the random component of the model, while (ii) determines the systematic component (Kedem & Fokianos, 2002). Note that (or , respectively) and the parameter range for have to be chosen such that always leads to a positive value, since the (conditional) mean of a count random variable is necessarily positive.1 Choosing the identity link , becomes a linear function in , with the INARCH models as discussed in Chapter 4.1 being instances of such conditional regression models with identity link. More generally, models of the form
are referred to as generalized autoregressive moving-average models (GARMA models) of order , where and are functions representing the autoregressive and moving-average terms (Benjamin et al., 2003; Kedem & Fokianos, 2002). This approach not only includes the INGARCH model according to Definition 4.1.1, but many other important models, some of which are briefly presented below.
Since the canonical link function (also natural link) of the Poisson distribution is the log link , one often considers a log-linear Poisson model of the form
that is, where the conditional mean is determined multiplicatively as . Note that the logarithm is a (bijective and strictly monotonic increasing) mapping between and ; that is, the right-hand side of (5.2) will always produce a positive value, independent of the parameter range for .
The conditional approach according to Definition 5.1.1 assumes that the count at time can be explained by the past observations and the covariate information up to time . For a marginal (Poisson) regression model in the sense of Fahrmeir & Tutz (2001), the past observations are without explanatory power provided that the current covariates are given. So the marginal distribution of the counts can be modeled directly. In its basic form, a marginal Poisson regression model requires , conditioned on , to be Poisson distributed according to , where the mean satisfies
with the design vector now being a function of only . A typical example is the seasonal log-linear model being used by Höhle & Paul (2008) for epidemic counts, defined by
An immediate extension of the above marginal models towards parameter-driven models is obtained by assuming an additional latent process – one may also assume that a part of the covariate process is unobservable – say, . Then the conditional means defined by are modeled by the approach in (5.5).
Parameter-driven models for multivariate count processes have been proposed by Jørgensen et al. (1999) and Jung et al. (2011). The state space model by Jørgensen et al. (1999) uses a conditional Poisson distribution and assumes the latent process to be a type of gamma Markov process; the covariate information is embedded with a log-link approach. The dynamic factor model by Jung et al. (2011) generates the components of the -dimensional counts from conditionally independent Poisson distributions. The corresponding -dimensional vectors of Poisson means constitute a latent process and are determined by three latent factors (log-linear model), which themselves are assumed to be independent Gaussian AR(1) processes. More information on these and further models for multivariate count time series can be found in the survey by Karlis (2016).
5.2 Hidden-Markov Models
A very popular type of parameter-driven model for count processes is the hidden-Markov model (HMM); actually, such HMMs can be defined for any kind of range, even for categorical processes; see Section 7.3. According to Ephraim & Merhav (2002), the first paper about HMMs was the one by Baum & Petrie (1966), who referred to them as “probabilistic functions of Markov chains”; they in fact focussed on the categorical case discussed in Section 7.3. This section gives an introduction to these models (for the count data case), while a much more comprehensive treatment of HMMs is provided by the book by Zucchini & MacDonald (2009); also the survey article by Ephraim & Merhav (2002) is recommended for further reading.
HMMs assume a bivariate process , where the are the observable random variables, whereas the are the hidden states (latent states) with range where . Note that the numbers just constitute a numerical coding of the hidden states, which are assumed to be of categorical nature (possibly not even ordinal; see Chapter 6 for more details). Possible choices for the observations' range are discussed below. The (categorical) state process is assumed to be a homogeneous Markov chain (Appendix B.2). Given the state process, the observation process is serially independent with its pmf being solely determined through the current state (in this sense, we are concerned with a “probabilistic function” of a Markov chain; see Baum & Petrie (1966)). A common graphical representation of this data-generating mechanism is shown in Figure 5.7.
Let us return to HMMs. These models are defined by two sets of parameters: one determining the distribution of the state process , and another concerning the conditional distribution of the observation given the current state (state-dependent distribution). The state process is assumed to satisfy the above state equation (5.9) and, in addition, to be a homogeneous Markov chain with the state transition probabilities being given by
5.11
Let denote the corresponding transition matrix. The initial distribution of either leads to additional model parameters, or it is determined by a stationarity assumption; that is, , where satisfies the invariance equation (see (B.4)). We shall restrict ourselves to stationary HMMs here; that is, for all and all .
Concerning the observations, the observation equation (5.8) has to hold; that is,
These state-dependent distributions are also assumed to be time-homogeneous, say for the states . So for all .
In applications, parametric distributions are assumed for the . For illustration, we shall mainly focus on the Poisson HMM, but any other count model could be used as well, or even different models for different states. As mentioned before, HMMs might be adapted to any kind of range for the observations (whereas the states are always categorical), for example, to continuous-valued cases like or to purely categorical cases, as discussed in Section 7.3. The Poisson HMM assumes the distribution of , conditioned on , to be the Poisson distribution ; that is,
and consequently . The complete set of model parameters is given by and , where has to hold for all .
Let us look at some stochastic properties of the resulting observation process (Zucchini & MacDonald, 2009, Section 2.3). Let , as a function of , denote the diagonal matrices . Then the marginal pmf and the bivariate probabilities are computed as
For the limiting behavior of for , see Remark B.2.2.1 on the Perron–Frobenius theorem.
Next, we turn to the question of parameter estimation. A widely used approach is the Baum–Welch algorithm, which is an instance of the expectation-maximization (EM) algorithm; see Chapter 4 in Zucchini & MacDonald (2009) for a detailed description. Alternatively, a direct (numerical) maximization of the likelihood function can be performed. Provided that accurate starting values have been selected, the latter approach usually converges much faster than the Baum–Welch algorithm; see Bulla & Berzel (2008). Also, MacDonald (2014) concludes that the direct maximization of the likelihood is often advantageous. Therefore, we shall concentrate on this latter approach here.
The forward probabilities defined in the Remark 5.2.3 are not only useful in view of likelihood computation, but also for forecasting future observations. The observations' -step-ahead forecasting distribution, given the observations , is computed as (Zucchini & MacDonald, 2009, Section 5.2):
Note that these probabilities are easily updated for increasing according to the recursive scheme in (5.16). Such an updating is also required if residuals are to be computed for the fitted model. While the forecast pseudo-residuals (Zucchini & MacDonald, 2009, Section 6.2.3) can be computed exactly using (5.19) with , the standardized Pearson residuals (Section 2.5) need to be approximated by computing , with being sufficiently large.
In some applications, it might also be necessary to predict a future state of the HMM; in this case,
5.20
should be used, where is the th unit vector (Example A.3.3).
5.3 Discrete ARMA Models
The “new” discrete ARMA (NDARMA) models were proposed by Jacobs & Lewis (1983). They generate an ARMA-like dependence structure through some kind of random mixture. There are several ways of formulating these models, for example through a backshift mechanism, as in Jacobs & Lewis (1983), or by using Pegram's operator, as in Biswas & Song (2009). Here, we follow the approach of Weiß & Göb (2008) to give a representation close to the conventional ARMA recursion.
Note that exactly one out of becomes 1; all others are equal to 0. Hence the NDARMA recursion (21) implies that each observation chooses either one of the past observations or one of the past (unobservable) innovations . Because of this mechanism, the stationary marginal distribution of is identical to that of ; that is, , and we always have
The autocorrelations are non-negative and can be determined from the Yule–Walker equations (Jacobs & Lewis, 1983)
22
where the satisfy
which implies for , and . While these properties might suggest that the NDARMA models should be very attractive in practice for ARMA-like count processes, they have an important limitation: the sample paths generated by NDARMA processes tend to show long runs (constant segments) of a certain count value. This is illustrated by Figure 5.11, where the plotted sample path differs markedly from the corresponding INAR(1) path in Figure 2.5b and the INARCH(1) path in Figure 4.3. Since these long runs and large jumps between them are a rather uncommon pattern in real count time series, the NDARMA models are rarely used in the count data context, although we shall see in Section 7.2 that they are quite useful when considering categorical time series. An important exception is the modeling of video traffic data (Tanwir & Perros, 2014), as briefly sketched in the following example.