Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5
Further Models for Count Time Series

The INARMA and INGARCH approaches described above have become very popular in recent years for the modeling of stationary and ARMA-like count processes. But a large number of other count time series models has also been proposed in the literature. Three of these alternatives are presented in this chapter: regression models in Section 5.1, hidden-Markov models in Section 5.2, and NDARMA models in Section 5.3.

5.1 Regression Models

A traditional approach for modeling count data (not just time series data) are regression models. The main advantage of regression models is their ability to incorporate covariate information (although also extensions of, for example, the INARMA models have been developed that include covariate information; see Remark 3.1.7). Here, we will review some of these regression models for count time series. Among others, we consider the observation-driven Markov models proposed by Zeger & Qaqish (1988) in Example 5.1.3, which had a groundbreaking effect for research on count time series similar to the work of McKenzie (1985) and Al-Osh & Alzaid (1987) on the thinning-based INAR(1) model (Section 2.2). It will become clear that the INGARCH model discussed in Chapter 4.1 can be understood as an instance of the family of count regression models. A much more detailed discussion of regression models for count time series can be found in Chapter 4 of the book by Kedem & Fokianos (2002); further recent references on this topic are provided by Fokianos (2011) and Tjøstheim (2012).

Let $c05-math-001$ be a count process, and let $c05-math-002$ be a vector-valued covariate process (which might also be deterministic). To simplify the discussion, we shall mainly consider the case of a conditional Poisson distribution (Example A.1.1), although non-Poisson distributions like the negative binomial distribution (Example A.1.4) have also been considered in the literature. The conditional mean (as the parameter of the conditional Poisson distribution) is assumed to be “linked” to a linear expression of the available information. Therefore, the considered models are commonly referred to as generalized linear models (GLMs) (Kedem & Fokianos, 2002).

Many of these Poisson regression models for count processes are conditional regression models in the sense of Fahrmeir & Tutz (2001); that is, they are defined by specifying the conditional distribution of the counts, given the available observations and covariates.

Part (i) specifies the random component of the model, while (ii) determines the systematic component (Kedem & Fokianos, 2002). Note that $c05-math-017$ (or $c05-math-018$ , respectively) and the parameter range for $c05-math-019$ have to be chosen such that $c05-math-020$ always leads to a positive value, since the (conditional) mean of a count random variable is necessarily positive.¹ Choosing the identity link $c05-math-022$ , $c05-math-023$ becomes a linear function in $c05-math-024$ , with the INARCH models as discussed in Chapter 4.1 being instances of such conditional regression models with identity link. More generally, models of the form

5.1

are referred to as generalized autoregressive moving-average models (GARMA models) of order $c05-math-026$ , where $c05-math-027$ and $c05-math-028$ are functions representing the autoregressive and moving-average terms (Benjamin et al., 2003; Kedem & Fokianos, 2002). This approach not only includes the INGARCH model according to Definition 4.1.1, but many other important models, some of which are briefly presented below.

Since the canonical link function (also natural link) of the Poisson distribution is the log link $c05-math-029$ , one often considers a log-linear Poisson model of the form

5.2

that is, where the conditional mean is determined multiplicatively as $c05-math-031$ . Note that the logarithm is a (bijective and strictly monotonic increasing) mapping between $c05-math-032$ and $c05-math-033$ ; that is, the right-hand side of (5.2) will always produce a positive value, independent of the parameter range for $c05-math-034$ .

Example 5.1.2 (Poisson GLARMA model)

The observation-driven Poisson model proposed by Davis et al. (2003) is related to the GARMA approach (5.1). Let $c05-math-035$ and $c05-math-036$ with $c05-math-037$ (note that $c05-math-038$ corresponds to the Pearson residuals). The authors then define the Poisson GLARMA(p,q) process (generalized linear …) by the recursion

where the corresponding characteristic polynomials $c05-math-039$ and $c05-math-040$ are required to have their roots outside the unit circle (analogous to the basic ARMA model according to Definition B.3.5).

As a simple example, Davis et al. (2003) consider the case $c05-math-041$ constantly and $c05-math-042$ , that is,

with $c05-math-043$ . Davis et al. (2003) emphasize that $c05-math-044$ constitutes a Markov chain, while the observations process $c05-math-045$ depends on the whole past. They also derive the formulae $c05-math-046$ and $c05-math-047$ , where the latter reduces to the constant term $c05-math-048$ if $c05-math-049$ . Davis et al. (2003) show that $c05-math-050$ has a stationary solution for $c05-math-051$ , which is unique for $c05-math-052$ .

Figure 5.1 Poisson GLARMA(0, 1) model with $c05-math-053$ : (a) plot of simulated sample path ( $c05-math-054$ ), (b) the corresponding SACF, (c) SACF of another simulated sample path ( $c05-math-055$ ). See Example 5.1.2.

Possible properties of such Poisson GLARMA(0, 1) models can be recognized from Figure 5.1, where we set $c05-math-056$ (Pearson residuals). Parts (a) and (b) refer to a simulated sample path of length $c05-math-057$ for the model parametrization $c05-math-058$ . The plot in (a) shows that the model produces sporadic extreme observations ( $c05-math-059$ , $c05-math-060$ ), and the SACF in (b) is of MA(1)-type with a positive value for $c05-math-061$ . The SACF in (c), in contrast, refers to a sample path where $c05-math-062$ is negative, $c05-math-063$ . The SACF is still of MA(1)-type ( $c05-math-064$ , $c05-math-065$ ), but with a negative value for $c05-math-066$ . This is a major difference to the INARMA models from Section 3.1 and to the INGARCH models from Section 4.1, where the ACF can only take non-negative values.

Example 5.1.3 (Log-linear Poisson autoregression)

Zeger & Qaqish (1988) consider observation-driven Markov models of the form

which constitute an instance of the GARMA(p, 0) model according to (5.1). As specific cases, they suggest two models, one following the recursion

5.3

the other following

Here, $c05-math-068$ is a constant that avoids problems with the logarithm if $c05-math-069$ . Both models might be understood as log-linear generalizations of the INARCH(p) model (4.3). This relation is further exploited by Fokianos & Tjøstheim (2011), who define a model by the log-linear Poisson autoregression

5.4

which constitutes a modification of the Poisson INGARCH(1, 1) model from Example 4.1.4. But while the INGARCH(1, 1) model has an additive structure, the one of the log-linear model (5.4) is multiplicative:

Although it has a feedback mechanism, the model is observation-driven, which follows arguments analogous to Example 4.1.4. The feedback mechanism constitutes a parametrically parsimonious way of creating a long memory.

Fokianos & Tjøstheim (2011) argue that the actual choice of $c05-math-071$ does not have a strong effect when fitting the model (5.4) to given data, so they recommend to simply set $c05-math-072$ . To allow for consistent ML estimation, Fokianos & Tjøstheim (2011) show that the range of the real-valued parameters $c05-math-073$ has to be restricted by the requirement $c05-math-074$ if $c05-math-075$ have the same sign, and by $c05-math-076$ otherwise.

Figure 5.2 SACF for log-linear model (5.4): (a) $c05-math-077$ (b) $c05-math-078$ . See Example 5.1.3.

Possible features of the model (5.4) (with $c05-math-079$ ) can be studied based on simulated sample paths (of length $c05-math-080$ ). The SACF in Figure 5.2a, which corresponds to the model $c05-math-081$ with $c05-math-082$ and $c05-math-083$ (the choice of the parameter values was motivated by Example 4.1.5), shows that a slowly decaying and positive-valued autocorrelation structure might be obtained, analogous to the Poisson INGARCH(1, 1) model from Example 4.1.4. But in contrast to this model, negative ACF values are also possible, as illustrated by the SACF shown in Figure 5.2b ( $c05-math-084$ , $c05-math-085$ ), which was generated based on the parametrization $c05-math-086$ .

Example 5.1.4 (Non-linear Poisson autoregression)

A further generalization of the Poisson INGARCH(1, 1) model is discussed by Neumann (2011), who defined the non-linear Poisson autoregression

Here, the function $c05-math-087$ has to satisfy the contractive condition

to guarantee a stationary solution, where $c05-math-088$ and $c05-math-089$ . Furthermore, the process is $c05-math-090$ -mixing (actually, even $c05-math-091$ -mixing) with geometrically decreasing weights. An extension of the above approach allowing for non-Poisson conditional distributions is considered by Davis & Liu (2016); see also Christou & Fokianos (2014) for the particular case of a conditional negative binomial distribution.

The model by Neumann (2011) not only includes the INGARCH(1, 1) model as a special case, but also the exponential autoregressive model

with $c05-math-092$ and $c05-math-093$ as proposed by Fokianos et al. (2009) ( $c05-math-094$ or $c05-math-095$ lead to the INGARCH(1, 1) model).

A similar non-linear Poisson autoregression is proposed by Fokianos & Tjøstheim (2012),

where $c05-math-096$ have to satisfy the regularity conditions given in Fokianos & Tjøstheim (2012) in view of, for example, consistent ML estimation. A particular instance is given by $c05-math-097$ and $c05-math-098$ , where $c05-math-099$ leads to the INGARCH(1, 1) model. For $c05-math-100$ , the model becomes truly non-linear.

The conditional approach according to Definition 5.1.1 assumes that the count at time $c05-math-106$ can be explained by the past observations and the covariate information up to time $c05-math-107$ . For a marginal (Poisson) regression model in the sense of Fahrmeir & Tutz (2001), the past observations are without explanatory power provided that the current covariates are given. So the marginal distribution of the counts can be modeled directly. In its basic form, a marginal Poisson regression model requires $c05-math-108$ , conditioned on $c05-math-109$ , to be Poisson distributed according to $c05-math-110$ , where the mean $c05-math-111$ satisfies

5.5

with the design vector $c05-math-113$ now being a function of only $c05-math-114$ . A typical example is the seasonal log-linear model being used by Höhle & Paul (2008) for epidemic counts, defined by

5.6

where $c05-math-116$ with period $c05-math-117$ .

Example 5.1.6 (Legionnaires' disease infections)

Legionnaires' disease, which often leads to pneumonia for infected persons, with a mortality rate of up to 15%, is caused by Legionella bacteria, which can be found in hot water systems. Infections happen by inhaling droplets of contaminated water; Legionnaires' disease is not spread from person to person.² In an analogous way to Example 4.2.6, we consider a count time series obtained from the database SurvStat@RKI 2.0 of the Robert-Koch-Institut (2016) (data as at 22 January 2016). The counts $c05-math-118$ provide the weekly numbers of new infections with Legionnaires' disease in Germany, for the period 2002–2008 ( $c05-math-119$ ).

A plot of the data is shown in Figure 5.3a. A seasonal behavior is obvious, but also a slightly increasing trend. The seasonality is also apparent from the ACF in Figure 5.3b, and further inspection of a periodogram confirms that period $c05-math-129$ is dominant (referring to the $c05-math-130$ weeks per year). So we start by fitting a marginal regression model to the data: the seasonal log-linear Poisson model (5.6), where $c05-math-131$ is chosen (larger values of $c05-math-132$ did not lead to significant estimates). For improved estimation, the original linear term $c05-math-133$ in (5.6) was reparametrized as $c05-math-134$ . The obtained ML estimates are shown in Table 5.1. To check the model adequacy, Pearson residuals are computed. While the residuals' ACF confirms the fitted marginal regression model, their variance of 1.396 indicates overdispersion, thus voting against the Poisson model.

Table 5.1 Legionnaires' disease counts: ML estimates, AIC and BIC for different models

Model	Parameter					AIC	BIC
	$c05-math-120$	$c05-math-121$	$c05-math-122$	$c05-math-123$	$c05-math-124$
Poi. SLL	2.069	0.478	–0.142	–0.322		2014	2029
$c05-math-125$ , …, $c05-math-126$	(0.035)	(0.057)	(0.023)	(0.024)
NB SLL	2.068	0.480	–0.138	–0.322	27.957	1995	2014
$c05-math-127$ , …, $c05-math-128$	(0.041)	(0.067)	(0.027)	(0.028)	(7.721)

Figures in parentheses are standard errors.

Figure 5.3 Legionnaires' disease: (a) counts, (b) sample autocorrelation, (c) Pearson residuals' SACF (w. r. t. NB model). See Example 5.1.6.

For ease of presentation, we have concentrated on Poisson regression models until now, but any other count distribution could also be used for constructing a regression model, for example a zero-inflated Poisson distribution, as in Yang et al. (2013). In view of the overdispersion observed for the data, we shall now consider a modified seasonal log-linear model, based on the negative binomial distribution (Example A.1.4). Analogous to the NB-INGARCH model by Zhu (2011) (see Example 4.2.1, Höhle & Paul (2008), Davis et al. (2009) and Christou & Fokianos (2014)), we consider the approach $c05-math-136$ (with $c05-math-137$ still following (5.6)); that is, $c05-math-138$ varies with time according to $c05-math-139$ , whereas the NB-parameter $c05-math-140$ is an additional parameter. As a consequence, the variance at time $c05-math-141$ equals $c05-math-142$ . The ML-fitted seasonal log-linear NB model in Table 5.1 not only leads to improved AIC and BIC, but also the Pearson residuals confirm the model adequacy (variance 1.021, ACF shown in Figure 5.3c). Note the significantly positive estimate for $c05-math-143$ ; according to the fitted NB model, the mean number of Legionnaires' disease infections increases in time with factor $c05-math-144$ , corresponding to about a 7.1% increase per year. This is also visible from Figure 5.4, where the median as well as the 2.5%- and 97.5%-quantiles of the fitted model with time-varying marginal distribution $c05-math-145$ are shown.

Figure 5.4 Plot of the Legionnaires' disease counts, together with median and 2.5%-/97.5%-quantiles of the fitted $c05-math-135$ distribution; see Example 5.1.6.

Example 5.1.7 (Cryptosporidiosis infections)

As in Example 5.1.6, we consider a time series $c05-math-146$ of weekly counts of new infections in Germany (period 2002–2008, $c05-math-147$ ), again taken from the SurvStat@RKI 2.0 database (Robert-Koch-Institut, 2016, data as at 22 January 2016). But now these counts refer to cryptosporidiosis infections, which cause watery diarrhoea. Cryptosporidiosis is commonly transmitted by infected water or food, but in contrast to Legionnaires' disease, it can also be passed from person to person by direct contact.³ The plot in Figure 5.5a shows a strong seasonal pattern (again period $c05-math-148$ ) such that the seasonal log-linear Poisson model (5.6) (with linear term $c05-math-149$ ) is again a reasonable first candidate for the data (now we use $c05-math-150$ ; that is, we allow for a half-year effect). An analysis of the corresponding Pearson residuals, however, not only indicates strong overdispersion (variance 3.030), but this time, the residuals also exhibit significant autocorrelations: the plots of SACF and SPACF in Figures 5.5b and 5.5c, respectively, indicate an AR(2)-like autocorrelation structure. Therefore, a conditional regression model with an additional autoregressive component appears to be more appropriate for the data. This might be obtained by adding the autoregressive terms directly to the means $c05-math-151$ from (5.6) (as for INARCH models) as suggested by Held et al. (2006). Alternatively, the log-linear autoregressive model (5.3) of Zeger & Qaqish (1988) (Example 5.1.3) can be used. We shall follow the latter approach here; that is, we consider the second-order model defined by

Figure 5.5 Cryptosporidiosis: (a) counts, and for the Pearson residuals' w. r. t. marginal model: (b) the SACF and (c) the SPACF. See Example 5.1.6.

In view of the overdispersion, a negative binomial conditional distribution with parametrization $c05-math-152$ is used. The obtained CML estimates and corresponding standard errors are displayed in Table 5.2.

Table 5.2 Cryptosporidiosis counts: ML estimates for NB SLL model with additional AR(2) part

$c05-math-153$	$c05-math-154$	$c05-math-155$	$c05-math-156$	$c05-math-157$	$c05-math-158$	$c05-math-159$	$c05-math-160$	$c05-math-161$
2.803	0.451	–0.101	–0.630	–0.185	0.039	18.023	0.418	0.130
(0.077)	(0.129)	(0.051)	(0.050)	(0.047)	(0.047)	(2.654)	(0.052)	(0.050)

Figures in parentheses are standard errors.

The Pearson residuals computed for this negative binomial log-linear autoregressive model have variance 1.023, and their SACF indicates an adequate autocorrelation structure. The plot in Figure 5.6, where a graph for the conditional means $c05-math-162$ has been added, shows that the model explains the data quite well.

Similar to the Legionnaires' disease counts from Example 5.1.6, the linear term is significantly positive; that is, the mean number of cryptosporidiosis infections also increases with time according to the model. But the main difference between the model for Legionnaires' disease and the one for cryptosporidiosis infections is the additional autoregressive part (of order 2) for the latter. A possible explanation could be that cryptosporidiosis may spread from person to person (the disease may last up to two weeks); that is, the autoregressive part serves as the “epidemic component” (Held et al., 2006).

Illustartion of Plot of the Cryptosporidiosis counts together with conditional means. — **Figure 5.6** Plot of the Cryptosporidiosis counts together with conditional means; see Example 5.1.7.

An immediate extension of the above marginal models towards parameter-driven models is obtained by assuming an additional latent process – one may also assume that a part of the covariate process is unobservable – say, $c05-math-163$ . Then the conditional means defined by $c05-math-164$ are modeled by the approach in (5.5).

Example 5.1.8 (Parameter-driven regression models)

A famous instance of such a marginal model is the parameter-driven regression model of Zeger (1988). Let $c05-math-165$ be a positive real-valued and weakly stationary process with $c05-math-166$ and $c05-math-167$ . Conditioned on the latent process $c05-math-168$ (and possibly on deterministic covariate information), $c05-math-169$ is assumed to be a process of independent counts with

where $c05-math-170$ is the covariate known at time $c05-math-171$ . Hence, the time-varying marginal mean and variance follow as

see Zeger (1988), while the ACF is obtained as

The model of Zeger (1988) for the case of a conditional Poisson distribution is further discussed in Davis et al. (2000), while a related approach using a conditional negative binomial distribution is proposed by Davis et al. (2009). In particular, Chan & Ledolter (1995) and Davis et al. (2000) consider the special case where $c05-math-172$ follows a lognormal distribution (for identifiability reasons, Davis et al. (2000) set the mean of $c05-math-173$ equal to 1). Defining $c05-math-174$ , the model recursion can then be rewritten as

It holds that $c05-math-175$ ; see Davis et al. (2000).

While the stochastic properties of the parameter-driven model of Zeger (1988) are easily obtained, parameter estimation is much more demanding; possible approaches are presented by Zeger (1988) and Davis et al. (2000 2009).

Parameter-driven models for multivariate count processes have been proposed by Jørgensen et al. (1999) and Jung et al. (2011). The state space model by Jørgensen et al. (1999) uses a conditional Poisson distribution and assumes the latent process to be a type of gamma Markov process; the covariate information is embedded with a log-link approach. The dynamic factor model by Jung et al. (2011) generates the components of the $c05-math-176$ -dimensional counts from conditionally independent Poisson distributions. The corresponding $c05-math-177$ -dimensional vectors of Poisson means constitute a latent process and are determined by three latent factors (log-linear model), which themselves are assumed to be independent Gaussian AR(1) processes. More information on these and further models for multivariate count time series can be found in the survey by Karlis (2016).

5.2 Hidden-Markov Models

A very popular type of parameter-driven model for count processes is the hidden-Markov model (HMM); actually, such HMMs can be defined for any kind of range, even for categorical processes; see Section 7.3. According to Ephraim & Merhav (2002), the first paper about HMMs was the one by Baum & Petrie (1966), who referred to them as “probabilistic functions of Markov chains”; they in fact focussed on the categorical case discussed in Section 7.3. This section gives an introduction to these models (for the count data case), while a much more comprehensive treatment of HMMs is provided by the book by Zucchini & MacDonald (2009); also the survey article by Ephraim & Merhav (2002) is recommended for further reading.

HMMs assume a bivariate process $c05-math-178$ , where the $c05-math-179$ are the observable random variables, whereas the $c05-math-180$ are the hidden states (latent states) with range $c05-math-181$ where $c05-math-182$ . Note that the numbers $c05-math-183$ just constitute a numerical coding of the hidden states, which are assumed to be of categorical nature (possibly not even ordinal; see Chapter 6 for more details). Possible choices for the observations' range are discussed below. The (categorical) state process $c05-math-184$ is assumed to be a homogeneous Markov chain (Appendix B.2). Given the state process, the observation process $c05-math-185$ is serially independent with its pmf being solely determined through the current state $c05-math-186$ (in this sense, we are concerned with a “probabilistic function” of a Markov chain; see Baum & Petrie (1966)). A common graphical representation of this data-generating mechanism is shown in Figure 5.7.

Illustartion of data-generating mechanism of an HMM. — **Figure 5.7** Graphical representation of the data-generating mechanism of an HMM.

Remark 5.2.1 (State space models)

HMMs are special types of state space models. Let $c05-math-187$ be a bivariate and discrete-valued process. Then the $c05-math-188$ th conditional probability splits into

5.7

The process $c05-math-190$ follows a (generalized) state space model (Brockwell & Davis, 2016, Section 9.8) if the first conditional probability in (5.7) is assumed to simplify to

5.8

So the conditional distribution of $c05-math-192$ is completely determined by the current state $c05-math-193$ (as also assumed for HMMs); Equation 5.8 is referred to as the observation equation.

Next, consider the second conditional probability in (5.7). According to Cox (1981), $c05-math-194$ is classified as being parameter-driven if the following state equation holds:

5.9

that is, together with (5.8), the distribution of the observation $c05-math-196$ is determined by the latent states; see also Example 5.1.8. Equation 5.9 also holds for HMMs, where it is further assumed that $c05-math-197$ ; that is, the state process $c05-math-198$ is simply assumed to be a Markov chain (Appendix B.2).

In contrast, $c05-math-199$ is classified as being observation-driven (Cox, 1981) if the state equation equals

5.10

that is, summing out $c05-math-201$ in (5.7) and (5.8), the distribution of the observation $c05-math-202$ would be solely determined by the past observations $c05-math-203$

Let us return to HMMs. These models are defined by two sets of parameters: one determining the distribution of the state process $c05-math-204$ , and another concerning the conditional distribution of the observation $c05-math-205$ given the current state $c05-math-206$ (state-dependent distribution). The state process $c05-math-207$ is assumed to satisfy the above state equation (5.9) and, in addition, to be a homogeneous Markov chain with the state transition probabilities being given by

5.11

Let $c05-math-209$ denote the corresponding transition matrix. The initial distribution $c05-math-210$ of $c05-math-211$ either leads to additional model parameters, or it is determined by a stationarity assumption; that is, $c05-math-212$ , where $c05-math-213$ satisfies the invariance equation $c05-math-214$ (see (B.4)). We shall restrict ourselves to stationary HMMs here; that is, $c05-math-215$ for all $c05-math-216$ and all $c05-math-217$ .

Concerning the observations, the observation equation (5.8) has to hold; that is,

These state-dependent distributions are also assumed to be time-homogeneous, say $c05-math-218$ for the states $c05-math-219$ . So $c05-math-220$ for all $c05-math-221$ .

In applications, parametric distributions are assumed for the $c05-math-222$ . For illustration, we shall mainly focus on the Poisson HMM, but any other count model could be used as well, or even different models for different states. As mentioned before, HMMs might be adapted to any kind of range for the observations (whereas the states are always categorical), for example, to continuous-valued cases like $c05-math-223$ or to purely categorical cases, as discussed in Section 7.3. The Poisson HMM assumes the distribution of $c05-math-224$ , conditioned on $c05-math-225$ , to be the Poisson distribution $c05-math-226$ ; that is,

5.12

and consequently $c05-math-228$ . The complete set of model parameters is given by $c05-math-229$ and $c05-math-230$ , where $c05-math-231$ has to hold for all $c05-math-232$ .

Let us look at some stochastic properties of the resulting observation process $c05-math-233$ (Zucchini & MacDonald, 2009, Section 2.3). Let $c05-math-234$ , as a function of $c05-math-235$ , denote the diagonal matrices $c05-math-236$ . Then the marginal pmf and the bivariate probabilities are computed as

5.13

where $c05-math-238$ denotes the vector of ones. To express mean $c05-math-239$ and variance $c05-math-240$ , let us introduce the notation $c05-math-241$ with $c05-math-242$ , and $c05-math-243$ . Then it follows that

5.14

The autocovariance $c05-math-245$ equals

5.15

For the limiting behavior of $c05-math-247$ for $c05-math-248$ , see Remark B.2.2.1 on the Perron–Frobenius theorem.

Example 5.2.2 (Poisson HMM)

For the Poisson HMM (5.12), Equations 5.13–5.15 further simplify. Equation 5.13 implies that the marginal pmf of a Poisson HMM is a mixture of Poisson distributions. In particular, (5.14) simplifies to

that is, the Poisson HMM is marginally overdispersed; the more diverse the mixed Poisson distributions, the stronger the overdispersion. This is illustrated by Figure 5.8a, where the pmfs of two stationary three-state Poisson HMMs are shown: both have the same state transition matrix (also see Example 7.1.1),

but different state-dependent Poisson distributions. The black pmf corresponds to the Poisson means $c05-math-249$ , and it has mean 1.6 and dispersion ratio 1.4. The gray pmf, in contrast, has the much more distant Poisson means $c05-math-250$ , which lead to mean 3.4 and to a very strong level of overdispersion, at about 4.012.

Figure 5.8 Pmf (a) and ACF (b) of two three-state Poisson HMMs; see Example 5.2.2.

For the special case of a two-state Poisson HMM (that is, with $c05-math-251$ ), mean and variance further simplify to

It is also known that the state transition probabilities $c05-math-252$ for $c05-math-253$ can be rewritten in the form $c05-math-254$ with $c05-math-255$ and with a $c05-math-256$ ; see (7.6). Furthermore, the powers of $c05-math-257$ become $c05-math-258$ . Hence, the autocovariance function (5.15) simplifies to

So the ACF $c05-math-259$ is exponentially decaying, and the damping effect of the factor $c05-math-260$ decreases with increasing overdispersion (an analogous conclusion holds for the ACFs of the three-state Poisson HMMs shown in Figure 5.8b). Note that the formulae for $c05-math-261$ and $c05-math-262$ also hold for an arbitrary number $c05-math-263$ of states, provided that the states follow a DAR(1) model (see Example 7.2.2).

Next, we turn to the question of parameter estimation. A widely used approach is the Baum–Welch algorithm, which is an instance of the expectation-maximization (EM) algorithm; see Chapter 4 in Zucchini & MacDonald (2009) for a detailed description. Alternatively, a direct (numerical) maximization of the likelihood function can be performed. Provided that accurate starting values have been selected, the latter approach usually converges much faster than the Baum–Welch algorithm; see Bulla & Berzel (2008). Also, MacDonald (2014) concludes that the direct maximization of the likelihood is often advantageous. Therefore, we shall concentrate on this latter approach here.

Remark 5.2.3 (Likelihood estimation)

The likelihood function of a HMM with parameter vector $c05-math-264$ (see also Remark B.2.1.2), given the observations $c05-math-265$ , can be computed as (Zucchini & MacDonald, 2009, Chapter 3)

Obviously, the following recursive scheme holds:

Here, $c05-math-266$ is the vector of forward probabilities at time $c05-math-267$ :

5.16

Later, we shall also need the backward probabilities (Zucchini & MacDonald, 2009, Section 4.1)

5.17

which follow recursively from $c05-math-270$ and $c05-math-271$ .

Once $c05-math-272$ has been implemented according to the above scheme, it can be maximized by using a numerical optimization routine; see also the discussion in Remark B.2.1.2. Common asymptotic approximations for the standard errors and distribution of the ML estimator, however, have to be treated with caution. For example, a very large sample size might be required to obtain a reasonable approximation. See the discussion in Section 3.6 of Zucchini & MacDonald (2009).

It should be pointed out that for large $c05-math-273$ , one may experience numerical underflow when computing $c05-math-274$ . For such a case, Zucchini & MacDonald (2009) recommend computing $c05-math-275$ instead of $c05-math-276$ , where $c05-math-277$ and $c05-math-278$ . Note that the recursive scheme $c05-math-279$ holds. So one computes

5.18

for $c05-math-281$ The likelihood function is obtained as

The forward probabilities defined in the Remark 5.2.3 are not only useful in view of likelihood computation, but also for forecasting future observations. The observations' $c05-math-282$ -step-ahead forecasting distribution, given the observations $c05-math-283$ , is computed as (Zucchini & MacDonald, 2009, Section 5.2):

5.19

Note that these probabilities are easily updated for increasing $c05-math-285$ according to the recursive scheme in (5.16). Such an updating is also required if residuals are to be computed for the fitted model. While the forecast pseudo-residuals (Zucchini & MacDonald, 2009, Section 6.2.3) can be computed exactly using (5.19) with $c05-math-286$ , the standardized Pearson residuals (Section 2.5) need to be approximated by computing $c05-math-287$ , with $c05-math-288$ being sufficiently large.

In some applications, it might also be necessary to predict a future state of the HMM; in this case,

5.20

should be used, where $c05-math-290$ is the $c05-math-291$ th unit vector (Example A.3.3).

Remark 5.2.4 (Decoding the hidden states)

Schemes for identifying the hidden states, a task commonly referred to as decoding, are derived in Section 5.3 of Zucchini & MacDonald (2009). Here, we present brief summaries of these schemes.

Local decoding refers to the identification of the single hidden state $c05-math-292$ , given the observations $c05-math-293$ . The following approach is based on the forward probabilities (5.16) and backward probabilities (5.17) defined in Remark 5.2.3. Since $c05-math-294$ can be shown to hold, the “most plausible” state at time $c05-math-295$ is given by

Global decoding refers to the identification of the complete series of hidden states $c05-math-296$ . To find the sequence of states maximizing $c05-math-297$ , the Viterbi algorithm can be used. For all $c05-math-298$ , define the probabilities

which are computed recursively as

Then the decoded states are obtained as

Note that the results of local and global decoding might differ from each other.

Example 5.2.5 (Download counts)

Let us pick up again the time series of download counts, as discussed in Section 2.6 and Example 3.2.1. These serially dependent and overdispersed counts have been shown to be reasonably described by an NB-INAR(1) or NB-RCINAR(1) model. Now we shall investigate if a Poisson HMM might also be appropriate for these data. Since the number of parameters increases quadratically in the number of states ( $c05-math-299$ parameters if there are $c05-math-300$ states), we shall first try a two-state Poisson HMM (that is, with $c05-math-301$ ). In this case ( $c05-math-302$ ), the ML-estimated Markov model for the state process is

So, for example, the overall probability of being in state 0 is about 66.8%, and the probability for remaining in state 0 even equals about 80.3%. In each of the states 0 and 1, we have a conditional Poisson model, with the estimated means being about 0.995 (std. err. 0.127) in state 0 and 5.267 (std. err. 0.409) in state 1, respectively. So the predominant state 0 corresponds to “low download activity”, while the less frequent state 1 might be interpreted as “high download activity”.

To check if this simple two-state Poisson HMM is adequate for the data, let us first look at some properties of the fitted model. While mean ( $c05-math-303$ ) and ACF ( $c05-math-304$ ) of the fitted model are reasonably close to the corresponding sample values (Section 2.6), the marginal variance appears to be slightly too small (dispersion index 2.676 vs. 3.127). A similar observation (now with respect to the conditional variance) is made if looking at the Pearson residuals; these have a variance of about 1.113. Hence, in contrast to the above NB-INAR(1) or NB-RCINAR(1) models, the fitted two-state Poisson HMM is not able to fully explain the observed (conditional) variance. Therefore, let us fit a three-state Poisson HMM ( $c05-math-305$ ; that is, with nine model parameters) to the data. For the state process, this leads to

where the second and third column do not sum up to exactly 1 because of rounding. For the corresponding state-dependent Poisson distributions, we obtain the estimates $c05-math-306$ with approximate standard errors $c05-math-307$ . So from a practical point of view, we now distinguish between phases of low (state 0), medium (state 1) and high (state 2) download activity. While states 0 and 1 are “inert” in the sense that the respective conditional probabilities $c05-math-308$ for remaining in the present state are largest among all conditional probabilities $c05-math-309$ , state 2 will most likely change to state 1. As a consequence, we can only expect rather short periods of continuously high download activity. Another major difference is the marginal probabilities $c05-math-310$ for these states, with state 2 happening only rarely ( $c05-math-311$ %) and with states 0 and 1 being roughly equiprobable.

Figure 5.9 Download counts, see Example 5.2.5: (a) marginal frequencies (black) together with PMF of fitted three-state Poisson HMM (gray), and (b) PIT histogram based on this fitted model.

The fitted three-state Poisson HMM allows for a refined explanation of the time series, but how does it perform compared to the other models? First, it can be observed that it leads to the maximal log-likelihood, namely $c05-math-312$ compared to $c05-math-313$ for the NB-INAR(1) model and to $c05-math-314$ for the NB-RCINAR(1) model. Due to the large number of model parameters, however, it has worse values for AIC and BIC: 1090 and 1123, respectively. Let us further investigate the properties of the fitted model. Its marginal pmf is rather close to the empirical one (Figure 5.9a), having the mean $c05-math-315$ and dispersion index $c05-math-316$ , and the ACF of the fitted model, $c05-math-317$ , is also reasonable. Analyzing the Pearson residuals, we find no significant autocorrelations, and their variance $c05-math-318$ is close to 1 this time. Finally, the PIT histogram in Figure 5.9b is close to uniformity (see Figure 2.9 for comparison). So, in summary, besides the drawback of a large number of parameters, the three-state Poisson HMM does rather well for the download counts, and it is also easy to interpret.

Figure 5.10 Plot of the download counts together with globally decoded states of fitted three-state Poisson HMM (mean $c05-math-319$ is shown in gray if state equals $c05-math-320$ ). See Example 5.2.5.

Related to this last aspect, let us decode the hidden states as described in Remark 5.2.4. The result of a global decoding (using the Viterbi algorithm) is shown in Figure 5.10, where the $c05-math-321$ th state is represented by the corresponding Poisson mean $c05-math-322$ . As already conjectured from the estimated transition matrix $c05-math-323$ , we observe long runs of the states 0 and 1, but only short runs or sporadic occurrences of the state 2. If we had done a local decoding for all $c05-math-324$ instead, only three states would have been decoded differently: at times $c05-math-325$ , we would have obtained the state 2 instead of state 1.

Remark 5.2.6 (Further extensions)

There are a number of ways of generalizing the basic HMM; see Chapter 8 in Zucchini & MacDonald (2009) for a detailed survey. As an example, one may allow the state process $c05-math-326$ to follow a higher-order Markov model, possibly with additional parametric assumptions like those discussed in Section 7.1 (for example, the MTD(p) or DAR(p) models). Other options are the inclusion of covariate information or additional dependencies at the observation level, for example analogues of Markov-switching autoregressive models; see also Remark 7.3.5.

Also the more general state space approach discussed in Remark 5.2.1 offers a way of obtaining further models for count processes. While we considered the case of a finite and discrete state space for the ease of presentation, one can also allow for, say, a continuous-valued range; see Section 9.8 in Brockwell & Davis (2016) for details. Doing this, the parameter-driven model of Chan & Ledolter (1995) and Davis et al. (2000) according to Example 5.1.8 belongs to the class of generalized state space models, with the lognormally distributed $c05-math-327$ as the latent state at time $c05-math-328$ . An observation-driven example is the Poisson model proposed by Harvey & Fernandes (1989), where the states are conditionally gamma distributed, given the previous observations.

5.3 Discrete ARMA Models

The “new” discrete ARMA (NDARMA) models were proposed by Jacobs & Lewis (1983). They generate an ARMA-like dependence structure through some kind of random mixture. There are several ways of formulating these models, for example through a backshift mechanism, as in Jacobs & Lewis (1983), or by using Pegram's operator, as in Biswas & Song (2009). Here, we follow the approach of Weiß & Göb (2008) to give a representation close to the conventional ARMA recursion.

Note that exactly one out of $c05-math-341$ becomes 1; all others are equal to 0. Hence the NDARMA recursion (21) implies that each observation $c05-math-342$ chooses either one of the past observations $c05-math-343$ or one of the past (unobservable) innovations $c05-math-344$ . Because of this mechanism, the stationary marginal distribution of $c05-math-345$ is identical to that of $c05-math-346$ ; that is, $c05-math-347$ , and we always have

The autocorrelations are non-negative and can be determined from the Yule–Walker equations (Jacobs & Lewis, 1983)

where the $c05-math-349$ satisfy

which implies $c05-math-350$ for $c05-math-351$ , and $c05-math-352$ . While these properties might suggest that the NDARMA models should be very attractive in practice for ARMA-like count processes, they have an important limitation: the sample paths generated by NDARMA processes tend to show long runs (constant segments) of a certain count value. This is illustrated by Figure 5.11, where the plotted sample path differs markedly from the corresponding INAR(1) path in Figure 2.5b and the INARCH(1) path in Figure 4.3. Since these long runs and large jumps between them are a rather uncommon pattern in real count time series, the NDARMA models are rarely used in the count data context, although we shall see in Section 7.2 that they are quite useful when considering categorical time series. An important exception is the modeling of video traffic data (Tanwir & Perros, 2014), as briefly sketched in the following example.

Illustartion of Simulated sample path of Poisson DAR(1) process. — **Figure 5.11** Simulated sample path of Poisson DAR(1) process with $c05-math-353$ and $c05-math-354$ .

Example 5.3.2 (Video traffic modeling)

A video can be understood as a sequence of frames, where each frame is displayed for, say, 1/30 of a second. To reduce the amount of data corresponding to a video sequence, many different video compression schemes have been developed, making use of (among other approaches) the fact that successive frames belonging to the same scene are usually very similar to each other. The resulting sequence of frame sizes (say, the number of bytes per frame) is typically characterized by large variation and a strong autocorrelation (also by seasonality caused by regular patterns of so-called I-, B- and P-frames); see Tanwir & Perros (2014) for more details.

Videoconference traffic data is often characterized by high autocorrelation and low motion. As shown by Heyman et al. (1992) and Lazaris & Koutsakis (2010), a DAR(1) model with a (truncated) negative binomial marginal distribution is well-suited to describing multiplexed videoconference traffic data (with separate models for I-, B- and P-frames), but may not be appropriate for single traces (according to Lazaris & Koutsakis (2010), at least five traces need to be superpositioned), and not for video sequences with frequent scene changes (Heyman & Lakshman, 1996; Tanwir & Perros, 2014). These results appear plausible in view of the characteristic feature of NDARMA models to produce runs of certain values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5: Further Models for Count Time Series

Create new playlist

Sign In

Sign Up

5.1 Regression Models

5.2 Hidden-Markov Models

5.3 Discrete ARMA Models

Table of Contents for
Chapter 5: Further Models for Count Time Series