Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2
A First Approach for Modeling Time Series of Counts: The Thinning-based INAR(1) Model

As a first step towards the analysis and modeling of count time series, we consider an integer-valued counterpart to the conventional first-order autoregressive model, the INAR(1) model of McKenzie (1985). This constitutes a rather simple and easily interpretable Markov model for stationary count processes, but it is also quite powerful due to its flexibility and expandability. In particular, it allows us to introduce some basic approaches for parameter estimation, model diagnostics and statistical inference. These are used in an analogous way also for the more advanced models discussed in Chapters 3–5. The presented models and methods are illustrated with a data example in Section 2.5.

To prepare for our discussion about count time series, however, we start in Section 2.1 with a brief introduction to the notation used in this book, and with some remarks regarding characteristic features of count distributions in general (without a time aspect).

2.0 Preliminaries: Notation and Characteristics of Count Distributions

In contrast to the subsequent sections, here we remove any time aspects and look solely at separate random variables and their distributions. The first aim of this preliminary section is to acquaint the reader with the basic notation used in this book. The second one is to briefly highlight characteristic features of count distributions, which will be useful in identifying appropriate models for a given scenario or dataset. To avoid a lengthy and technical discussion, detailed definitions and surveys of specific distributions are avoided here but are provided in Appendix A instead.

Count data express the number of certain units or events in a specified context. The possible outcomes are contained in the set of non-negative integers, $c02-math-001$ . These outcomes are not just used as labels; they arise from counting and are hence quantitative (ratio scale). Accordingly, we refer to a quantitative random variable $c02-math-002$ as a count random variable if its range is contained in the set of non-negative integers, $c02-math-003$ . Some examples of random count phenomena are:

the number of emails one gets at a certain day (unlimited range $c02-math-004$ )
the number of occupied rooms in a hotel with $c02-math-005$ rooms (finite range $c02-math-006$ )
the number of trials until a certain event happens (unlimited range $c02-math-007$ ).

A common way of expressing location and dispersion of a count random variable $c02-math-008$ is to use mean and variance, denoted as

The definition and notation of more general types of moments are summarized in Table 2.1; note that $c02-math-009$ is the mean of $c02-math-010$ , and $c02-math-011$ is the variance of $c02-math-012$ .

Table 2.1 Definition and notation of moments of a count random variable $c02-math-013$

For $c02-math-014$ , we refer to

$c02-math-015$ as the $c02-math-016$ th moment of $c02-math-017$ ,

$c02-math-018$ as the $c02-math-019$ th central moment of $c02-math-020$ ,

$c02-math-021$ as the $c02-math-022$ th factorial moment of $c02-math-023$ .

While such moments give insight into specific features of the distribution of $c02-math-024$ , the complete distribution is uniquely defined by providing its probability mass function (pmf), which we abbreviate as

Similarly, $c02-math-025$ denotes the cumulative distribution function (cdf). An alternative way of completely characterizing a count distribution is to derive an appropriate type of generating function; the most common types are summarized in Table 2.2. The probability generating function (pgf), for instance, encodes the pmf of the distribution, but it also allows derivation of the factorial moments: the $c02-math-026$ th derivative satisfies $c02-math-027$ ; in particular, $c02-math-028$ . The coefficients $c02-math-029$ of $c02-math-030$ are referred to as the cumulants. Particular cumulants are

that is, $c02-math-031$ is the skewness and $c02-math-032$ the excess of the distribution. The coefficients $c02-math-033$ of the factorial-cumulant generating function (fcgf) are referred to as the factorial cumulants.

Table 2.2 Definition and notation of generating functions of a count r. v. $c02-math-034$

Generating functions of $c02-math-035$ :
Probability (pgf)	$c02-math-036$
Moment (mgf)	$c02-math-037$
Cumulant (cgf)	$c02-math-038$
Factorial-cumulant (fcgf)	$c02-math-039$

A number of parametric models for count distributions are available in the literature. See Appendix A for a brief survey. There, the models are sorted according to the dimension of their ranges (univariate vs. multivariate), and according to size: in some applications, there exists a fixed upper bound $c02-math-040$ that can never be exceeded, so the range is of finite size, taking the form $c02-math-041$ ; otherwise, we have the unlimited range $c02-math-042$ .

Distributions for the case of $c02-math-043$ being a univariate count random variable with the unlimited range $c02-math-044$ are presented in Appendix A.1. There, the Poisson distribution has an outstanding position (similar to the normal distribution in the continuous case) and often serves as the benchmark for the modeling of count data. One of its main characteristics is the equidispersion property, which means that its variance is always equal to its mean. If we define the (Poisson) index of dispersion as

2.1

for a random variable $c02-math-046$ with mean $c02-math-047$ and variance $c02-math-048$ , then the Poisson distribution always satisfies $c02-math-049$ . Values for $c02-math-050$ deviating from 1, in turn, express a violation of the Poisson model: $c02-math-051$ indicates an overdispersed distribution, such as the negative binomial distribution from Example A.1.4 or Consul's generalized Poisson distribution from Example A.1.6. $c02-math-052$ expresses underdispersion, for example in the Good distribution from Example A.1.7 or the PL distribution from Example A.1.8.

Figure 2.1 illustrates the difference between the equidispersed Poisson distribution (black) and the overdispersed negative binomial distribution (NB; gray) or generalized Poisson distribution (GP; light gray), respectively. All distributions are calibrated to the same mean $c02-math-053$ , but the plotted NB and GP models have dispersion indices $c02-math-054$ (that is, 100% overdispersion). It can be seen that both the NB and GP models have more probability mass for values $c02-math-055$ , but also the zero probability is increased (Poi: $c02-math-056$ , NB: $c02-math-057$ , GP: $c02-math-058$ ); the latter phenomenon is discussed in more detail below.

Graphical depiction of Count distributions with μ = 1.5. — **Figure 2.1** Count distributions with $c02-math-059$ : Poisson in black; NB and GP distributions (both with 100% overdispersion) in gray.

**Figure 2.1** Count distributions with $c02-math-059$ : Poisson in black; NB and GP distributions (both with 100% overdispersion) in gray.

Figure 2.2, in contrast, illustrates the effect of underdispersion, compared to the equidispersed Poisson distribution (black) with mean $c02-math-060$ : the plotted Good distribution (gray; Example A.1.7) and the PL $c02-math-061$ distribution (light gray; Example A.1.8) both have mean $c02-math-062$ and dispersion index $c02-math-063$ (that is, 50% underdispersion). These underdispersed models concentrate most of their probability mass on the values 1 and 2. In particular, the zero probability is much lower than in the Poisson case (Poi: $c02-math-064$ , Good: $c02-math-065$ , PL $c02-math-066$ : $c02-math-067$ ).

**Figure 2.2** Count distributions with $c02-math-068$ : Poisson in black; Good and PL $c02-math-069$ distributions (both with $c02-math-070$ underdispersion) in gray.

When discussing Figures 2.1 and 2.2, it becomes clear that another characteristic property of the Poisson distribution is the probability of observing a zero, $c02-math-071$ . Hence, the zero index (Puig & Valero, 2006)

2.2

as a function of mean $c02-math-073$ and zero probability $c02-math-074$ , takes the value 0 for the Poisson distribution, but may differ otherwise. Values $c02-math-075$ indicate zero inflation (excess of zeros with respect to a Poisson distribution), while $c02-math-076$ refers to zero deflation. A useful approach for modifying a distribution's zero probability is described in Example A.1.9.

The previous discussion as well as the definition of the indices (2.1) and (2.2) are for distributions having an unlimited range $c02-math-077$ . However, as mentioned before, sometimes the range is finite, $c02-math-078$ with fixed upper bound $c02-math-079$ . In such a case, the binomial distribution (Example A.2.1 in Appendix A.2) plays a central role. If we characterize its dispersion behavior in terms of the index of dispersion (2.1), the binomial distribution is underdispersed. However, since we are concerned with a different type of random phenomenon anyway – one with a finite range – it is more appropriate to evaluate the dispersion behavior in terms of the so-called binomial index of dispersion, defined by

2.3

for a random variable $c02-math-081$ with range $c02-math-082$ , mean $c02-math-083$ and variance $c02-math-084$ . See also Hagmark (2009), and note that $c02-math-085$ for $c02-math-086$ . In view of this index, the binomial distribution always satisfies $c02-math-087$ , while a distribution with $c02-math-088$ is said to exhibit extra-binomial variation. An example is the beta-binomial distribution from Example A.2.2. For illustration, Figure 2.3 shows a binomial and a beta-binomial distribution with range $c02-math-089$ and the unique mean 6, but with the beta-binomial distribution exhibiting a strong degree of extra-binomial variation (420%).

Image described by caption/surrounding text. — **Figure 2.3** Binomial distribution $c02-math-090$ in black, and corresponding beta-binomial distribution with $c02-math-091$ ( $c02-math-092$ ) in gray.

**Figure 2.3** Binomial distribution $c02-math-090$ in black, and corresponding beta-binomial distribution with $c02-math-091$ ( $c02-math-092$ ) in gray.

Although this book, as an introductory course in discrete-valued time series, mainly focusses on the univariate case, in some places a brief account of possible multivariate generalizations is also provided. Therefore, Appendix A.3 presents multivariate extensions to some basic count models such as the Poisson or negative binomial. These extensions preserve the respective univariate distribution for their marginals, but they induce cross-correlation between the components of the multivariate count vector. An example is plotted in Figure 2.4. The bivariate Poisson (Example A.3.1) and negative binomial distribution (Example A.3.2) shown there are adjusted to give the same mean $c02-math-093$ and the same cross-correlation $c02-math-094$ , but the negative binomial model obviously shows more dispersion in both components: the dispersion indices are 2 and $c02-math-095$ , respectively.

One of the multivariate count distributions, the multinomial distribution from Example A.3.3, will be of importance in Part II's consideration of categorical time series; see also the discussion in the appendix. The connection to compositional data (Remark A.3.4) is briefly mentioned in this context.

2.1 The INAR(1) Model for Time-dependent Counts

In 1985, in issues 4 and 5 of volume 21 of the Water Resources Bulletin (nowadays the Journal of the American Water Resources Association), a series of papers about time series analysis appeared, and were also published separately by the American Water Resources Association as the monograph Time Series Analysis in Water Resources (edited by K.W. Hipel). One of these papers, “Some simple models for discrete variate time series” by McKenzie (1985), introduced a number of AR(1)-like models for count time series. At this point, it is important to note that the conventional AR(1) recursion, $c02-math-098$ , cannot be applied to count processes: even if the innovations $c02-math-099$ are assumed to be integer-valued with range $c02-math-100$ , the observations $c02-math-101$ would still not be integer-valued, since the multiplication “ $c02-math-102$ ” does not preserve the discrete range (the so-called multiplication problem). Therefore, the idea behind McKenzie's new models was to use different mechanisms for “reducing” $c02-math-103$ . One such mechanism is the binomial thinning operator (Steutel & van Harn, 1979), which was used to define the integer-valued AR(1) model, or INAR(1) model for short. The binomial AR(1) model, discussed in Section 3.3, was also introduced in this context.

It seems that McKenzie's paper was overlooked in the beginning, possibly because the Water Resources Bulletin was not a typical outlet for time series papers: two years later, the INAR(1) model was proposed again by Al-Osh & Alzaid (1987), but now in the Journal of Time Series Analysis. Eventually, McKenzie's paper and the one by Al-Osh & Alzaid, turned out to be groundbreaking for the field of count time series, initiating innumerable research papers about thinning-based time series models (some of them are presented in Section 3) and attracting more and more attention to discrete-valued time series.

We shall now examine the important stochastic properties as well as relevant special cases of the INAR(1) model in great detail. This will allow for more compact presentations of many other models in the later chapters of this book.

2.1.1 Definition and Basic Properties

A way of avoiding the multiplication problem, as sketched above, is to use the probabilistic operation of binomial thinning (Steutel & van Harn, 1979), sometimes also referred to as binomial subsampling (Puig & Valero, 2007). If $c02-math-104$ is a discrete random variable with range $c02-math-105$ and if $c02-math-106$ , then the random variable $c02-math-107$ is said to arise from $c02-math-108$ by binomial thinning, and the $c02-math-109$ are referred to as the counting series. They are i.i.d. binary random variables with $c02-math-110$ , which are also independent of $c02-math-111$ . So by construction, $c02-math-112$ can only lead to integer values between 0 and $c02-math-113$ . The boundary values $c02-math-114$ and $c02-math-115$ might be included in this definition by setting $c02-math-116$ and $c02-math-117$ . Since each $c02-math-118$ satisfies $c02-math-119$ (see Example A.2.1), and since the binomial distribution is additive, $c02-math-120$ has a conditional binomial distribution given the value of $c02-math-121$ ; that is, $c02-math-122$ . In particular, using the law of total expectation, it follows that

So the binomial thinning $c02-math-123$ and the multiplication $c02-math-124$ have the same mean, which motivates us to use binomial thinning within a modified AR(1) recursion. However, they differ in many other properties; in particular, the multiplication is not a random operation. As an example, the law of total variance implies that

so we have $c02-math-125$ .

For the interpretation of the binomial thinning operation, consider a population of size $c02-math-126$ at a certain time $c02-math-127$ . If we observe the same population at a later time, $c02-math-128$ , then the population may have shrunk, because some of the individuals had died between times $c02-math-129$ and $c02-math-130$ . If the individuals survive independently of each other, and if the probability of surviving from $c02-math-131$ to $c02-math-132$ is equal to $c02-math-133$ for all individuals, then the number of survivors is given by $c02-math-134$ .

Using the random operator “ $c02-math-135$ ”, McKenzie (1985) and Al-Osh & Alzaid (1987) defined the INAR(1) process in the following way.

Note that it would be more correct to write “ $c02-math-146$ ” in the above recursion to emphasize the fact that the thinning is realized at each time $c02-math-147$ anew. However, for the sake of readability, the time index is avoided.

The INAR(1) recursion of Definition 2.1.1.1 can be interpreted as follows (Al-Osh & Alzaid, 1987):

2.4

The INAR(1) process is a homogeneous Markov chain with the 1-step transition probabilities given by (McKenzie, 1985; Al-Osh & Alzaid, 1987):

2.5

For conditional mean and variance, we have (Alzaid & Al-Osh, 1988):

2.6

which are both linear functions of $c02-math-151$ . For the derivation of (2.5) and (2.6), note that $c02-math-152$ and $c02-math-153$ are independent according to Definition 2.1.1.1. Since the conditional mean is linear in $c02-math-154$ , the INAR(1) model belongs to the class of conditional linear AR(1) models, or CLAR(1), as discussed by Grunwald et al. (2000). Note that the conditional variance differs from the AR(1) case as it varies with time (conditional heteroscedasticity; see the discussion before Definition B.4.1.1).

Let us now assume that the INAR(1) process is even stationary (Definition B.1.3). Conditions for guaranteeing a stationary solution of the INAR(1) recursion are discussed below. If we have given the innovations' distribution in terms of the pgf, then the observations' stationary marginal distribution is determined by the equation (Alzaid & Al-Osh, 1988):

2.7

See also the discussion in Section 2.1.3 below. Note that (2.7) is again obtained by applying the law of total expectation, as

Equation 2.7 can be used to determine the marginal moments or cumulants of $c02-math-156$ ; see Weiß (2013a). In particular, if $c02-math-157$ , mean and variance are given by

2.8

where $c02-math-159$ refers to the index of dispersion (2.1). It implies that $c02-math-160$ is over-/equi-/underdispersed iff $c02-math-161$ is over-/equi-/underdispersed; that is, the dispersion behavior of the observations is determined by the one of the innovations.

The autocorrelation function (ACF; see Definition B.1.1) $c02-math-162$ of a stationary INAR(1) process equals $c02-math-163$ (McKenzie, 1985; Al-Osh & Alzaid, 1987); that is, it is of AR(1) type. Expressions for higher-order joint moments in $c02-math-164$ are provided by Schweer & Weiß (2014).

Remark 2.1.1.2 (Branching process with immigration)

A branching process with immigration (BPI) $c02-math-165$ , also called a Galton–Watson process with immigration, is defined by the recursion (Venkataraman, 1982):

where $c02-math-166$ are mutually independent count random variables. The offspring variables $c02-math-167$ are i.i.d. with pgf $c02-math-168$ , and the immigration variables $c02-math-169$ are i.i.d. with pgf $c02-math-170$ . If the offspring mean satisfies $c02-math-171$ , then the BPI is said to be subcritical. The terminology “offspring” refers to the possible interpretation of $c02-math-172$ as the reproduction generated by the generation $c02-math-173$ . This interpretation is plausible if the $c02-math-174$ are allowed to also take values larger than 1.

If, however, $c02-math-175$ for all $c02-math-176$ – that is, if the $c02-math-177$ are Bernoulli-distributed according to $c02-math-178$ – then $c02-math-179$ is nothing else than the binomial thinning $c02-math-180$ . In this case, the interpretation of survivors (see Definition 2.1.1.1) is more appropriate. In particular, it becomes clear that the INAR(1) model according to Definition 2.1.1.1 can be understood as a special type of subcritical BPI; see Alzaid & Al-Osh (1988) and Kedem & Fokianos (2002, Section 5.1). As a consequence, results for subcritical BPIs can also be adapted to the INAR(1) process.

One such result is due to Heathcote (1966). Any BPI constitutes a homogeneous Markov chain; let $c02-math-181$ denote the $c02-math-182$ th harmonic number. If the BPI is subcritical, if it is an irreducible and aperiodic Markov chain (Appendix B.2.2), and if $c02-math-183$ , then there exists a proper stationary marginal distribution for $c02-math-184$ . Note that $c02-math-185$ is automatically satisfied if $c02-math-186$ has a finite mean. Another noteworthy result is the one by Pakes (1971) about the geometric ergodicity of subcritical BPIs, which can be used to derive mixing properties (see Definition B.1.5) for INAR(1) models; see also Example 2.1.3.3 below.

A further useful relationship of INAR(1) models is to certain queue length processes with an infinite number of servers. For instance, the Poisson INAR(1) model, which will be discussed in Section 2.1.2, corresponds to an $c02-math-187$ queue observed at integer times (McKenzie, 2003).

2.1.2 The Poisson INAR(1) Model

The most popular instance of the INAR(1) family is the Poisson INAR(1) model, which was introduced by McKenzie (1985) and Al-Osh & Alzaid (1987). Here, it is assumed that the innovations $c02-math-188$ are i.i.d. according to the Poisson distribution $c02-math-189$ , such that $c02-math-190$ . Since all $c02-math-191$ are truly positive, this also holds for all transition probabilities $c02-math-192$ from (2.5). Consequently, a Poisson INAR(1) process is an irreducible and aperiodic Markov chain (see Appendix B.2.2), such that Remark 2.1.1.2 implies a unique stationary marginal distribution for $c02-math-193$ .

It is well known that this stationary marginal distribution is also a Poisson distribution, $c02-math-194$ with $c02-math-195$ . This follows from two important invariance properties of the Poisson distribution

the invariance with respect to binomial thinning; that is, if $c02-math-196$ , then $c02-math-197$
the additivity; that is, if $c02-math-198$ , $c02-math-199$ and both are independent, then $c02-math-200$ ; see Example A.1.1.

Knowing both the conditional and the marginal distribution, we are able to easily simulate a stationary Poisson INAR(1) process – just initialize by $c02-math-201$ – and the full likelihood function is also directly available; see Remark B.2.1.2 and Example 2.2.2.1 below. Furthermore, the property of having both the observations and the innovations within the same distribution family is analogous to the case of a Gaussian AR(1) model. Another similarity between Poisson INAR(1) and Gaussian AR(1) processes, which distinguishes these special instances from other INAR(1) or AR(1) processes, respectively, is time reversibility, see Schweer (2015).

Example 2.1.2.1 (Sample paths)

Figure 2.5 shows two sample paths for simulated Poisson INAR(1) processes. Both models were calibrated to give the same observational mean, $c02-math-202$ , but the autocorrelation parameter $c02-math-203$ differs, and hence the innovations mean $c02-math-204$ . In Figure 2.5a, we have $c02-math-205$ and $c02-math-206$ , and this moderate level of autocorrelation becomes visible through the short-term up and down movements. The situation in Figure 2.5b is much more extreme: $c02-math-207$ implies that only rarely is a truly positive innovation (and hence an upward movement) generated. $c02-math-208$ leads to $c02-math-209$ being equal to $c02-math-210$ most of the time, hence the constant segments, and otherwise to a slowly descending behavior (leisure extinction). The constant segments also go along with a very small and nearly constant conditional variance according to (2.6); note that the linear coefficient $c02-math-211$ tends to 0 for either $c02-math-212$ or $c02-math-213$ . This piecewise constant and slowly descending behavior is a characteristic feature of many binomial-thinning-based models (with large thinning probabilities). Other models, such as the INARCH(1) model, as discussed in Example 4.1.6, exhibit different behavior if highly correlated; see Remark 4.1.7.

2.1.3 INAR(1) Models with More General Innovations

The INAR(1) model becomes particularly simple if the innovations are chosen to be Poisson distributed; see Section 2.1.2. But much more flexibility in terms of marginal distributions is possible. One option is to select an appropriate model for the observations $c02-math-217$ and then compute the corresponding innovations' distribution from (2.7); see McKenzie (1985) and the details below. Another option is to choose the distribution of the innovations $c02-math-218$ in order to obtain certain properties for the observations' distribution (Al-Osh & Alzaid, 1987; Alzaid & Al-Osh, 1988); this approach usually simplifies the computation of the transition probabilities (2.5). Generally, see (2.8), the dispersion behavior of the observations is easily controlled by that of the innovations. Also, the probability for observing a zero is influenced by the innovations: since the zero probability just equals $c02-math-219$ , (2.7) implies that

2.9

see Jazi et al. (2012). So we are not only able to generate over- or underdispersion, but also zero inflation or deflation (see Equation 2.2).

Let us now look at special instances. The most natural extension beyond Poisson distributions is to consider the family of discrete self-decomposable (DSD) distributions for $c02-math-221$ (Steutel & van Harn, 1979), which includes, for example, the negative binomial (NB) distribution (see Example A.1.4) as well as the generalized Poisson (GP) distribution (Example A.1.6); see Zhu & Joe (2003) and Weiß (2008a) for more details. Here, a distribution is said to be DSD if its pgf satisfies:

2.10

that is, the coefficients of its power series expansion must be non-negative and add up to 1. In view of (2.7) (2.10) implies that DSD distributions are the marginal distributions of an INAR(1) process that can be preserved for any choice of $c02-math-223$ ; the corresponding innovations' pgf is then given by (2.10). Note that any DSD distribution is also infinitely divisible (while the reverse statement does not hold). In other words, it is a particular type of compound Poisson (CP) distribution according to Example A.1.2. As a result, if it is not the Poisson distribution, a DSD distribution is overdispersed and zero-inflated.

If we do not insist on having the same marginal distribution for all $c02-math-232$ , we can simply select a distribution for the innovations, thus controlling dispersion or zero behavior of the observations; see the above discussion. Following this strategy, the most straightforward extension is to choose $c02-math-233$ to be CP-distributed, since then, as in the Poisson case, the observations are also CP-distributed, a characteristic which follows from the invariance properties described next (Schweer & Weiß, 2014).

In fact, Puig & Valero (2007) showed that a count model being parametrized by its $c02-math-242$ first factorial cumulants $c02-math-243$ is closed under addition and under binomial thinning iff it has a $c02-math-244$ distribution. These invariance properties lead to the definition of the compound Poisson INAR(1) or CP-INAR(1) model.

Example 2.1.3.3 (CP-INAR(1) model)

An INAR(1) process $c02-math-245$ according to Definition 2.1.1.1 is referred to as a CP-INAR(1) process if the innovations $c02-math-246$ are i.i.d. according to the $c02-math-247$ distribution from Example A.1.2 (possibly $c02-math-248$ ). Since $c02-math-249$ for $c02-math-250$ , it also follows from (2.8) that $c02-math-251$ is overdispersed.

In addition, the innovations $c02-math-252$ have a finite mean provided that $c02-math-253$ . Since a CP-INAR(1) process is also irreducible and aperiodic (Schweer & Weiß, 2014), we conclude that a CP-INAR(1) process with $c02-math-254$ possesses a unique stationary marginal distribution. According to Lemma 2.1.3.2, this unique stationary marginal distribution is a compound Poisson one, having the same compounding order $c02-math-255$ as the innovations (Schweer & Weiß, 2014, Theorem 3.2.1). Hence the observations' distribution is indeed overdispersed but also zero-inflated; see Equation (A.1). Formulae for the stationary marginal distribution and the $c02-math-256$ -step-ahead conditional distributions are provided by Schweer & Weiß (2014).

The relation to BPIs according to Remark 2.1.1.2 and, hence, the result by Pakes (1971) can be utilized to prove that a CP-INAR(1) process with $c02-math-257$ is $c02-math-258$ -mixing (see Definition B.1.5) with geometrically decreasing weights (Schweer & Weiß, 2014, Theorem 3.4.1), a property that is useful for central limit theorems applied to CP-INAR(1) processes.

A widely used special instance of the CP-INAR(1) model is the NB-INAR(1) model, in which the innovations are negatively binomially distributed (Example A.1.4). Note that the marginal distribution of $c02-math-259$ is not an NB distribution, but just another type of $c02-math-260$ distribution.

As mentioned above, a (non-Poisson) CP-INAR(1) model always has an overdispersed and zero-inflated marginal distribution. If, however, underdispersion or zero-deflation are required, then we have to choose the innovations from outside the CP family. Models with underdispersed innovations $c02-math-261$ (following the Good distribution in Example A.1.7 or the PL distribution in Example A.1.8), and therefore with underdispersed observations $c02-math-262$ according to (2.8), are discussed by Weiß (2013a). Jazi et al. (2012) consider zero-modified innovations (Example A.1.9).

Remark 2.1.3.4 (MC approximation)

Defining the INAR(1) model by specifying the innovations' distribution (as is commonly done in practice), one often does not obtain a closed-form expression for the observations' marginal distribution; a few exceptions are discussed in Schweer & Weiß (2014). If one is only interested in the zero probability, and if the innovations' pgf is available (as for the distributions discussed in Appendix A.1), one can approximate this probability via (2.9); that is, by computing $c02-math-263$ with $c02-math-264$ sufficiently large.

If the complete marginal distribution is required – for example, to compute the full likelihood function – then one may utilize the Markov chain (MC) property. For $c02-math-265$ sufficiently large, define $c02-math-266$ with the transition probabilities (2.5). Then the marginal probabilities $c02-math-267$ are approximated by the solution of the eigenvalue problem $c02-math-268$ ; see the invariance equation (B.4). An alternative approach for approximation is described in Remark 2.6.3.

Example 2.1.3.5 (NB-INAR(1) model)

Let us consider the NB-INAR(1) model to illustrate the approximations discussed in Remark 2.1.3.4; that is, where $c02-math-269$ according to Example A.1.4. The innovations have dispersion index $c02-math-270$ , so (2.8) implies for the observations: $c02-math-271$ . Since $c02-math-272$ and $c02-math-273$ , the zero probability is approximated by

see (2.9). The transition probabilities (2.5) are computed using

They can be used to apply the MC approximation for the pmf of the observations (with $c02-math-274$ in all the examples below).

For the NB-INAR(1) model with marginal mean $c02-math-275$ and $c02-math-276$ , one obtains $c02-math-277$ according to (2.8), and this equals $c02-math-278$ because of the NB assumption. For increasing $c02-math-279$ (note that $c02-math-280$ corresponds to the Poisson case), we compute

All of these models have the same marginal mean and the same ACF, but (among others) dispersion index and zero probability differ. Figure 2.6 compares the pmf of the equidispersed Poisson INAR(1) model (gray) to the one of the NB-INAR(1) model with $c02-math-281$ so $c02-math-282$ . The latter pmf has a higher zero probability as well as larger probabilities $c02-math-283$ for $c02-math-284$ (overdispersion).

**Figure 2.6** Marginal distribution ( $c02-math-285$ ) of NB-INAR(1) model $c02-math-286$ in black, and of Poisson INAR(1) model $c02-math-287$ in gray.

2.2 Approaches for Parameter Estimation

The INAR(1) model is determined by the thinning parameter $c02-math-288$ on the one hand, and by further parameters characterizing the marginal distribution of the observations or innovations, respectively, on the other hand. Given the time series data $c02-math-289$ , the task is to estimate the value of these parameters.

2.2.1 Method of Moments

Let $c02-math-290$ be a time series stemming from a stationary INAR(1) process. A quite pragmatic approach for parameter estimation is the method of moments (MM). Here, the idea is to select appropriate moment relations such that the true model parameters can be obtained by solving the resulting system of equations. For parameter estimation, the true moments are replaced by the corresponding sample moments (see Definition B.1.4), thus leading to the MM estimates.

For an INAR(1) model, one usually selects at least the marginal mean $c02-math-291$ (to be estimated by the sample mean $c02-math-292$ ) as well as the first-order autocorrelation $c02-math-293$ ; the latter immediately leads to an MM estimator of $c02-math-294$ , defined as $c02-math-295$ with $c02-math-296$ for $c02-math-297$ (Definition B.1.4).

If we have to fit a Poisson INAR(1) model according to Section 2.1.2, then we only have one additional parameter besides $c02-math-298$ , which is either the observations' mean $c02-math-299$ or the innovations' mean $c02-math-300$ (depending on the chosen parametrization). So, applying (2.8), we define the required MM estimator by either $c02-math-301$ or $c02-math-302$ . If we have to fit an INAR(1) model with more general innovations, as in Section 2.1.3, then further moment relations are required. For instance, for the NB-INAR(1) model, we could consider the sample variance $c02-math-303$ (Definition B.1.4), because relation (2.8) offers a simple way to estimate the NB parameter $c02-math-304$ , since the innovations' index of dispersion just equals $c02-math-305$ .

The estimators $c02-math-307$ and $c02-math-308$ are not only appropriate for the Poisson INAR(1) model, but more generally for any CLAR(1) model (Grunwald et al., 2000) that is parametrized with $c02-math-309$ defined by $c02-math-310$ . So the MM estimators do not rely on the particular distribution of a Poisson INAR(1) process, as the ML estimators from Section 2.1.2 do, but only on this particular moment relation. Hence one may classify such an MM estimator as being semi-parametric, and one may expect it to be robust to mild violations of the model assumptions; see also Jung et al. (2005). Certainly, the (asymptotic) distribution of the MM estimators depends on the specific underlying model.

Remark 2.2.1.2 (Conditional least squares estimation)

For the case of the Poisson INAR(1) model with parameters $c02-math-311$ and $c02-math-312$ (innovations' mean), the conditional least squares (CLS) approach can also be used for parameter estimation (Al-Osh & Alzaid, 1987; Freeland & McCabe, 2005). Here, the idea is to accumulate the squared deviations between $c02-math-313$ and $c02-math-314$ (with the latter being understood as the conditional mean forecast of $c02-math-315$ ), and to choose $c02-math-316$ and $c02-math-317$ such that this conditional sum of squares ( $c02-math-318$ ) is minimized:

As shown by Klimko & Nelson (1978) and Al-Osh & Alzaid (1987), explicit expressions for the CLS estimators are given by

Their asymptotic distribution (assuming a Poisson INAR(1) process) was shown to be same as that of the MM estimators $c02-math-319$ given in Example 2.2.1.1 (Klimko & Nelson, 1978; Al-Osh & Alzaid, 1987; Freeland & McCabe, 2005). In contrast to the MM approach, however, it is more difficult to find CLS estimators for other types of INAR(1) processes, as some parameters might not be identifiable from the conditional mean.

The main advantage of the MM estimators is their simplicity (closed-form formulae) and robustness. But for the Poisson INAR(1) model, Al-Osh & Alzaid (1987), Jung et al. (2005) and Weiß & Schweer (2016) recommend using ML estimators instead, because they are less biased for small sample sizes.

2.2.2 Maximum Likelihood Estimation

Like the method of moments, also the maximum likelihood (ML) approach relies on a universal principle: one chooses the parameter values such that the observed sample becomes most “plausible”. As shown in Remark B.2.1.2, the required (log-)likelihood function is easily computed for Markov processes. In the INAR(1) case with parameter vector $c02-math-320$ – for example $c02-math-321$ in the Poisson case or $c02-math-322$ in the NB case – the (full) log-likelihood function becomes

2.11

where the transition probabilities are computed according to (2.5). Sometimes, it is difficult to compute $c02-math-324$ . While this is just a simple Poisson probability in the case of a Poisson INAR(1) model, a closed-form formula for, for example, an NB-INAR(1) model, is not available. Then, one may use the MC approximation from Remark 2.1.3.4 to obtain $c02-math-325$ , or one may simply use the conditional log-likelihood function, which ignores the initial observation:

2.12

The (conditional) ML estimates are now computed as

respectively. In contrast to the CLS approach from Remark 2.2.1.2, it is difficult to find a closed-form solution to this optimization problem. Instead, a numerical optimization is typically applied, where, for example, the MM estimates described in Section 2.1.1 can be used as initial values for the optimization routine. If the optimization routine is able to compute the Hessian of $c02-math-327$ at the maximum, standard errors can also be approximated; see Remark B.2.1.2 for details.

A semi-parametric ML approach for INAR(1) processes, where the innovations' distribution is not further specified, was investigated by Drost et al. (2009).

2.3 Model Identification

Section 2.2 presented some standard approaches for fitting an INAR(1) model to a given count time series $c02-math-329$ . The obtained estimates are meaningful only if the data indeed stem from an INAR(1) model. So an obvious question is how to identify an appropriate model class for the given data.

First, we look at the serial dependence structure. As for any CLAR(1) model, the ACF of the INAR(1) model is of AR(1) type, given by $c02-math-330$ (Section 2.1.1). This, in turn, implies that the partial ACF (PACF) satisfies $c02-math-331$ and $c02-math-332$ for $c02-math-333$ ; see Theorem B.3.4. Hence to check if an INAR(1) model might be appropriate at all for the given time series data, we should compute the sample PACF (SPACF) to analyze if $c02-math-334$ deviates significantly from 0, and if $c02-math-335$ does not for any $c02-math-336$ .

Remark 2.3.1 (Sample PACF)

At this point, the (asymptotic) distribution of $c02-math-337$ becomes important. For stationary linear processes (Background B.3.1) with existing fourth-order moments, the asymptotic behavior of the sample ACF (SACF) is described by the well-known Bartlett's formula, and for an AR(1) process, it is known that the $c02-math-338$ for $c02-math-339$ are asymptotically independent and normally distributed with mean 0 and variance $c02-math-340$ (Brockwell & Davis, 1991). However, for non-linear processes, Bartlett's formula may be misleading, so one should use the more general result by Romano & Thombs (1996). For the case of a Poisson INAR(1) model, the asymptotic distribution of the $c02-math-341$ was derived by Mills & Seneta (1991) (see Remark 2.1.1.2). Although the asymptotic variances are slightly larger than $c02-math-342$ , asymptotic independence still holds between the $c02-math-343$ with $c02-math-344$ . So the autocorrelation structure can be identified in a completely analogous way to the AR(1) case.

Further tests for serial dependence in count time series are discussed by Jung & Tremayne (2003).

Once we have identified the AR(1)-like autocorrelation structure, we should next analyze the marginal distribution. Here, an important question is if the simple Poisson model does well, or if the observed marginal distribution deviates significantly from a Poisson distribution. In the latter case, the type of deviation (overdispersion, zero-inflation, and so on) may help us to identify an appropriate model.

A rather general approach that allows us to detect diverse violations of the Poisson INAR(1) model are the pgf-based tests proposed by Meintanis & Karlis (2014). These tests compare the conjectured bivariate pgf – that is, $c02-math-345$ – with its sample counterpart. For the null of a Poisson INAR(1) model, $c02-math-346$ are bivariately Poisson distributed (Example A.3.1) with the bivariate pgf being given by (Alzaid & Al-Osh, 1988):

2.13

which is symmetric in $c02-math-348$ in accordance with the time reversibility; note that (2.13) holds for any time lag $c02-math-349$ if replacing $c02-math-350$ by $c02-math-351$ . Since the (asymptotic) distributions of the proposed test statistics are intractable, Meintanis & Karlis (2014) recommend a bootstrap implementation of the tests.

More simple diagnostic tests can be obtained by focussing on a particular type of violation of the Poisson model. Often, such violations go along with a violation of the equidispersion property,¹ and overdispersion in particular is commonly observed in practice (Weiß, 2009c). An obvious test statistic for uncovering over- or underdispersion is the sample counterpart to the dispersion index (2.1); that is, $c02-math-352$ (see Definition B.1.4). Under the null of a Poisson INAR(1) model, this test statistic is asymptotically normally distributed with

2.14

see Schweer & Weiß (2014) and Weiß & Schweer (2015). Plugging in $c02-math-354$ instead of $c02-math-355$ , the resulting normal approximation can be used for determining critical values or for computing P values.

If several candidate models have been identified as being relevant for the given data, a popular way to select a final model is to consider information criteria such as the AIC and BIC (see Remark B.2.1.1, Equation (B.7), for the definitions), which are computed along with the ML estimates (Section 2.1.2). While the idea behind such information criteria is plausible, namely balancing goodness-of-fit against model size, they should be used with some caution in practice; see Emiliano et al. (2014). They may serve as guides for identifying a relevant model, but a decision to adopt a specific model should take into account further aspects; see Section 2.4. Other selection criteria include the conditional sum of squares, $c02-math-361$ , as computed during CLS estimation (Remark 2.2.1.2) or criteria related to forecasting (for example, realized coverage rates of prediction intervals); the topic of forecasting is discussed in Section 2.6. More generally, scoring rules, such as the ones discussed by Czado et al. (2009) and Jung & Tremayne (2011b) can be used for this purpose. Since some of these are closely related to tools for checking for model adequacy, we shall discuss them further; see Section 2.4 and Remark 2.4.1.

2.4 Checking for Model Adequacy

After having identified the best of the candidate models, it remains to check if they are really adequate for the analyzed data; that is, if the given time series constitutes a typical realization of the considered model. An obvious approach for checking the model adequacy is to compare some features of the fitted model with their sample counterparts, as computed from the available time series. Such a comparison should include the autocorrelation structure as well as marginal characteristics such as the mean, the dispersion ratio or the zero probability (see the corresponding formulae in Section 2.1). Besides merely comparing the respective numerical values, one may follow the idea of Tsay (1992) (see also Jung & Tremayne (2011a)) and compute acceptance envelopes for, for example, ACF or pmf, where the envelope is based on quantiles obtained from a parametric bootstrap for the fitted model.

More sophisticated tools relying on conditional distributions, which hence check the predictive performance, are presented in Jung & Tremayne (2011b) and Christou & Fokianos (2015). As a first approach, the standardized Pearson residuals (Harvey & Fernandes, 1989) should be analyzed; that is, the series

2.16

where the conditional moments are given by (2.6). For models that are not Markov chains, the definition of $c02-math-363$ has to be adapted accordingly. For an adequate model, we expect these residuals to be uncorrelated, with a mean about 0 and a variance about 1. A variance larger/smaller than 1 indicates that the data show more/less dispersion than being considered by the model (Harvey & Fernandes, 1989). The variance of the Pearson residuals or their mean sum of squares ( $c02-math-364$ “normalized squared error score”) are also sometimes used as a scoring rule for predictive model assessment (Czado et al., 2009). Instead of Pearson residuals, forecast (mid-)pseudo-residuals might also be used for checking the model adequacy (Zucchini & MacDonald, 2009, Section 6.2.3). But since these forecast pseudo-residuals are closely related to the PIT (described below), we shall not discuss this type of residual further here.

An approach that considers not only conditional moments, but the complete conditional distribution, is the (non-randomized) probability integral transform (PIT) (Czado et al., 2009; Jung & Tremayne, 2011b). Let $c02-math-365$ with $c02-math-366$ denote the conditional cdf, conditioned on the last observation being $c02-math-367$ , where the $c02-math-368$ are computed from (2.5). Then the mean PIT is defined as (Czado et al., 2009; Jung & Tremayne, 2011b):

2.17

Here, we define $c02-math-370$ for any $c02-math-371$ ; note that the $c02-math-372$ only needs to be computed for $c02-math-373$ . The mean PIT now allows us to construct a histogram in the following way: dividing $c02-math-374$ into the $c02-math-375$ subintervals $c02-math-376$ for $c02-math-377$ (say, $c02-math-378$ ), the $c02-math-379$ th rectangle is drawn with height $c02-math-380$ . If the fitted model is adequate, we expect the PIT histogram to look like that of a uniform distribution. Common deviations from uniformity are U-shaped histograms indicating that the fitted conditional distribution is underdispersed with respect to the data, while inverse-U shaped histograms indicate overdispersion (Czado et al., 2009), analogous to the variance of the Pearson residuals, as discussed above.

A related visual tool is the marginal calibration diagram (Czado et al., 2009), which compares the marginal frequencies of the time series, $c02-math-381$ where $c02-math-382$ , with the aggregated conditional distributions, $c02-math-383$ , for example by plotting the differences $c02-math-384$ against $c02-math-385$ . Here, $c02-math-386$ denotes the indicator function. Analogously, one can compare the respective cumulative distributions with each other; that is, $c02-math-387$ and $c02-math-388$ .

Remark 2.4.1 (Scoring rules)

As already mentioned, a number of scoring rules to assess the quality of predictive distributions have been proposed in the literature; see Czado et al. (2009) and Jung & Tremayne (2011b) for a detailed discussion. Typical scoring rules are of the form $c02-math-389$ , to compare the observation $c02-math-390$ realized at time $c02-math-391$ with the conditional distribution $c02-math-392$ based on the previous observation, where smaller score values express better agreement. The overall predictive performance of the model with respect to the time series $c02-math-393$ is evaluated by the mean score $c02-math-394$ .

A scoring rule that is closely related to the marginal calibration diagram is the ranked probability score (Czado et al., 2009; Jung & Tremayne, 2011b), which is defined as (the mean about) the squared deviations

2.18

between the conditional distribution and the actual observation. Other commonly used scoring rules are the logarithmic score

2.19

which goes along with the conditional log-likelihood computation (2.12), and the quadratic score

2.20

Computing the mean score related to the fitted candidate models, a scoring rule might be used in the context of model selection; see the discussion in the end of Section 2.3.

2.5 A Real-data Example

To illustrate the models and methods discussed up until now, let us consider the dataset presented by Weiß (2008a). This is a time series expressing the daily number of downloads of a editor for the period from June 2006 to February 2007 ( $c02-math-398$ ). The plot in Figure 2.7 shows that these daily counts vary between 0 and 14, without any visible trend or seasonality. The up and down movements indicate a moderate autocorrelation level, which is confirmed by the SACF plot in Figure 2.8a. After further inspecting the SPACF, where only $c02-math-399$ deviates significantly from 0, we conclude that an AR(1)-like model might be appropriate for describing the time series.

Illustartion of Plot of the download counts. — **Figure 2.7** Plot of the download counts; see Section 2.5.

Illustartion of Sample autocorrelation (a) and marginal frequencies (b) of the download counts. — **Figure 2.8** Sample autocorrelation (a) and marginal frequencies (b) of the download counts; see Section 2.5.

The observed marginal distribution is plotted in Figure 2.8b. The mean $c02-math-400$ is clearly smaller than the variance $c02-math-401$ , so, at least empirically, we are concerned with a strong degree of overdispersion. This goes along with a high zero probability, $c02-math-402$ , which is much larger than the corresponding Poisson value $c02-math-403$ (zero inflation). In summary, an INAR(1) model appears to be plausible for the data, possibly with an overdispersed (and zero-inflated) marginal distribution. As pointed out by Weiß (2008a), an INAR(1) model also seems plausible in view of interpretation (2.4): some downloads at day $c02-math-404$ might be initiated on the recommendation of users from the previous day $c02-math-405$ (“survivors”), the remaining downloads being due to users who became interested in the program on their own initiative (“immigrants”).

To test for overdispersion within the INAR(1) model, we apply the dispersion test described in Section 2.3, plugging in $c02-math-406$ instead of $c02-math-407$ into Equation 2.14. Comparing the observed value $c02-math-408$ with the approximate mean and standard deviation under the null of a Poisson INAR(1) model, given by about 0.994 and 0.092, respectively, it becomes clear that the overdispersion is indeed significant (P value $c02-math-409$ ). Therefore, we shall fit the NB-INAR(1) model to the data, but also the Poisson INAR(1) model and the corresponding i.i.d. models for illustration. The estimated mean and dispersion index of the innovations are given by $c02-math-410$ and $c02-math-411$ , respectively.

Parameter estimation is done by a full likelihood approach, using the MC approximation for the initial probability in the case of the NB-INAR(1) model; see Example 2.1.3.5). As initial values for the numerical optimization routine, simple moment estimates are used (Section 2.1.2):

$c02-math-412$ and $c02-math-413$ for the Poisson INAR(1)
$c02-math-414$ , $c02-math-415$ for the NB-INAR(1).

The ML estimates $c02-math-416$ are now obtained by maximizing the respective full log-likelihood function, and the corresponding standard errors are approximated from the computed Hessian $c02-math-417$ as the square roots of the diagonal elements from the inverse $c02-math-418$ (Remark B.2.1.2). The obtained results are summarized in Table 2.3 together with the (rounded) values of the AIC and BIC from (B.7).

Table 2.3 Download counts: ML estimates and AIC and BIC values for different models

Model	Parameter			AIC	BIC
	1	2	3
i.i.d. Poisson	2.401			1323	1327
$c02-math-419$	(0.095)
Poisson INAR(1)	1.991	0.174		1293	1300
$c02-math-420$	(0.110)	(0.033)
i.i.d. NB	1.108	0.316		1103	1111
$c02-math-421$	(0.158)	(0.034)
NB-INAR(1)	0.835	0.291	0.154	1092	1103
$c02-math-422$	(0.145)	(0.036)	(0.042)

Figures in parentheses are standard errors. AIC and BIC values rounded.

From the AIC and BIC values shown in Table 2.3, it becomes clear that the INAR(1) structure is always better than the respective i.i.d. model. In particular, the estimates for $c02-math-423$ are always significantly different from 0. Comparing the two INAR(1) models, the NB-INAR(1) model is clearly superior, as should be expected in view of the strong degree of overdispersion (and zero inflation). This decision is also supported by any of the scoring rules from Remark 2.4.1 ( $c02-math-424$ : 1.399 vs. 1.309; $c02-math-425$ : 2.384 vs. 2.022; $c02-math-426$ : $c02-math-427$ vs. $c02-math-428$ ). Note that the parameter $c02-math-429$ of the fitted NB models is always close to 1; that is, these NB distributions are close to a geometric distribution (see Example A.1.5). While the NB-INAR(1) model is the best of the considered candidate models, it remains to check if it is also adequate for the data (Section 2.4).

For illustration, we also include the Poisson INAR(1) model in the remaining analyses. We start by computing the marginal properties of the fitted INAR(1) models. The means of both INAR(1) models – 2.411 (Poisson) and 2.407 (NB), according to (2.8) – are close to $c02-math-430$ . The observed index of dispersion $c02-math-431$ , however, is much better reproduced by the NB-INAR(1) model (3.111) than by the equidispersed Poisson INAR(1) model. The same applies to the zero probability, where $c02-math-432$ compared to 0.258 (NB model; see (2.9) and Example 2.1.3.5) and 0.090 (Poisson model). Also an analysis of the respective Pearson residuals (both series show no significant autocorrelation) supports use of the NB-INAR(1) model: the residuals variance for the NB, at 0.931, is close to 1, whereas for the Poisson, the residuals variance, at 2.871, is much too large, thus indicating that the data show more dispersion than described by the Poisson model.

Finally, let us have a look at the PIT histogram in Figure 2.9. The PIT histogram of the NB-INAR(1) model in (b) is close to uniformity, while the one of the Poisson INAR(1) model in (a) is strongly U-shaped (and also asymmetric). This U-shape indicates that the Poisson model does not show sufficient dispersion, confirming our previous analyses.

Illustartion of PIT histograms based on fitted Poisson and NB-INAR(1) model. — **Figure 2.9** PIT histograms based on fitted Poisson and NB-INAR(1) model; see Section 2.5.

2.6 Forecasting of INAR(1) Processes

Given the model for the observed INAR(1) process, one of the main applications² of this model is to forecast future outcomes of the process. In other words, having observed $c02-math-433$ , we want to predict $c02-math-434$ for some $c02-math-435$ . For real-valued processes, the most common type of point forecast is the conditional mean, as this is known to be optimal in the sense of the mean squared error. Applying the law of total expectation iteratively together with (2.6), it follows that the $c02-math-436$ -step-ahead conditional mean is given by

Note that this conditional mean only depends on $c02-math-438$ , but not on earlier observations, due to the Markov property (Appendix B.2.1). Conditional mean forecasting for INAR(1) processes was further investigated by Sutradhar (2008), and mean-based forecast horizon aggregation – that is, the forecasting of the sum $c02-math-439$ given $c02-math-440$ – was discussed by Mohammadipour & Boylan (2012), including also other members of the INARMA family, the latter which are discussed in Section 3.1.

The main disadvantage of the mean forecast is that it will usually lead to a non-integer value, while $c02-math-441$ will certainly take an integer value from $c02-math-442$ . Therefore, coherent forecasting techniques (that only produce forecasts in $c02-math-443$ ) are required for count processes (Freeland & McCabe, 2004b). For this purpose, the $c02-math-444$ -step-ahead conditional distribution of $c02-math-445$ given the past $c02-math-446$ needs to be computed for the INAR(1) model; that is, the $c02-math-447$ -step-ahead transition probabilities $c02-math-448$ (again only depending on $c02-math-449$ thanks to the Markov property). Once this distribution is available, the corresponding conditional median and mode can be used as a coherent point forecast. In fact, the conditional median also satisfies an optimality property, as it minimizes the mean absolute error.

So the essential question is how to compute the $c02-math-450$ . First note that Al-Osh & Alzaid (1987) have shown the following equality in distribution:

So once the distribution of $c02-math-451$ is available, the $c02-math-452$ can be computed by adapting (2.5). Unfortunately, this distribution is generally not easily obtained. For the case of a CP-INAR(1) model, as introduced in Example 2.1.3.3 (the CP distribution is invariant with respect to binomial thinning according to Lemma 2.1.3.2), Schweer & Weiß (2014) showed that $c02-math-453$ is CP-distributed, and they provided a closed-form expression for the pgf of $c02-math-454$ . After having done a numerical series expansion for $c02-math-455$ , the $c02-math-456$ -step-ahead transition probabilities $c02-math-457$ are computed via (2.5) (replacing $c02-math-458$ by $c02-math-459$ ).

The $c02-math-463$ -step-ahead conditional distribution can certainly also be used to construct a prediction interval on level $c02-math-464$ , based on the $c02-math-465$ - and $c02-math-466$ -quantile from this distribution in case of a two-sided interval, or based on the $c02-math-467$ -quantile for an upper-sided interval (“worst-case prediction”).

Example 2.6.2 (Rig counts)

We analyze a time series of weekly counts of active rotary drilling rigs, where each count expresses the number of active offshore drilling rigs in Alaska for the period 1990–1997 (length $c02-math-468$ ). The data are available from Baker Hughes.³ These rig counts have been published for the USA and Canada since 1944, and international rig counts since 1975. They serve as an indicator of demand for products from the drilling industry.

A plot of the time series $c02-math-469$ is shown in Figure 2.10a. Obviously, we are concerned with low counts ( $c02-math-470$ ), and the long runs of values indicate a strong serial dependence. Indeed, looking at the SACF shown in Figure 2.10b, a high and slowly decreasing autocorrelation level becomes obvious ( $c02-math-471$ ). An inspection of the SPACF reveals an approximate AR(1)-like structure such that an INAR(1) model appears to be reasonable for the data. Applying the dispersion test (2.14), it turns out that the observed (slight) degree of overdispersion ( $c02-math-472$ ) is not significant (P value $c02-math-473$ ). The histogram from Figure 2.10c, where the pmf of the $c02-math-474$ distribution is shown in gray, confirms that a Poisson model might serve well for the data.

Figure 2.10 Plot of the rig counts in (a), their sample autocorrelation in (b), and marginal frequencies (black) together with a Poisson fit (gray) in (c); see Example 2.6.2.

So we fit a Poisson INAR(1) model to the data, leading to the ML estimates $c02-math-475$ (std. err. 0.018) and $c02-math-476$ (std. err. 0.011). An analysis of the Pearson residuals and the PIT histogram confirms that the fitted Poisson INAR(1) model works reasonably well for the data. Using this fitted model, the $c02-math-477$ -step-ahead forecasting distributions, conditioned on the last observation $c02-math-478$ , are easily computed using the results from Example 2.6.1. Due to the strong dependence structure, these distributions show little dispersion (see the term $c02-math-479$ in the formula for $c02-math-480$ ), which is certainly attractive for forecasting, and they converge only slowly to the marginal Poisson distribution for increasing $c02-math-481$ . This is illustrated by Figure 2.11, where in (b), the distributions are represented by gray colors, with increasing darkness for increasing probability value, and with the gray colors in the last column referring to the marginal distribution ( $c02-math-482$ ). The median forecast for increasing $c02-math-483$ is equal to 0 for $c02-math-484$ , and equal to 1 for $c02-math-485$ . The 95%-quantile (as some kind of worst/best-case scenario) varies from 1 (lags 1–3) to 2 (lags 4–8) to 3 (lags 9–25) to 4 (lags $c02-math-486$ ).

Figure 2.11 $c02-math-487$ -step-ahead forecasting distributions for horizons $c02-math-488$ (dark to light) in (a), and $c02-math-489$ in (b); see Example 2.6.2.

Remark 2.6.3 (Approximate forecasting distribution)

If closed-form expressions for $c02-math-490$ are not available, one can make use of the Markov property. The MC approximation described in Remark 2.1.3.4 is easily modified for forecasting. If again $c02-math-491$ is sufficiently large, and if $c02-math-492$ with the transition probabilities (2.5), then the $c02-math-493$ -step-ahead transition probabilities $c02-math-494$ are approximated by the matrix $c02-math-495$ ; see formula (B.3). Due to the ergodicity of the INAR(1) process (Remark 2.1.1.2), the columns of $c02-math-496$ converge to the approximate stationary marginal distribution $c02-math-497$ from Remark 2.1.3.4, thus offering an alternative way of numerically computing $c02-math-498$ . Concerning the speed of convergence, see the Perron–Frobenius theorem, as described in Remark B.2.2.1.

Applied to the fitted NB-INAR(1) model from Section 2.5, where the last download count equals $c02-math-499$ , the $c02-math-500$ -step-ahead forecasting distributions $c02-math-501$ converge very quickly to the stationary marginal distribution as $c02-math-502$ (the quick convergence is not surprising in view of the weak autocorrelation level). This is illustrated by Figure 2.12, where the distributions for $c02-math-503$ , $c02-math-504$ and $c02-math-505$ (marginal distribution) are shown. The median forecast equals 2 for all forecasting horizons $c02-math-506$ , while other quantiles may slightly change, say the lower quartile from 1 ( $c02-math-507$ ) to 0 ( $c02-math-508$ ), the upper quartile from 4 ( $c02-math-509$ ) to 3 ( $c02-math-510$ ), and the 95% quantile from 9 ( $c02-math-511$ ) to 8 ( $c02-math-512$ ). The latter could be used as the limit of an upper-sided 95% prediction interval. A two-sided interval is not possible since the zero probability is much larger than 2.5% for all $c02-math-513$ . The mode, another option for coherent point forecasting, equals 1 for $c02-math-514$ , and 0 otherwise.

Figure 2.12 $c02-math-515$ -step-ahead forecasting distribution of fitted NB-INAR(1) model from Section 2.5, for forecasting horizon $c02-math-516$ in black, $c02-math-517$ in dark gray, and $c02-math-518$ (marginal distribution) in light gray.

Up to now, we have assumed the INAR(1) model and its parameters, say $c02-math-519$ , to be known. In practice, however, one has to estimate the parameters; that is, the forecasting distribution depends on the estimate $c02-math-520$ . This causes uncertainty in the computed forecasting distribution. The case of a Poisson INAR(1) model, as in Example 2.6.2, is discussed by Freeland & McCabe (2004b). Here, the asymptotic distribution of, say, the ML estimator is known; see Section 2.2. It is an asymptotic normal distribution such that the asymptotic distribution of $c02-math-521$ can be determined by applying the Delta method. A closed-form expression for the asymptotic variance of $c02-math-522$ was derived by Freeland & McCabe (2004b), and this can be used for computing a confidence interval for $c02-math-523$ . Jung & Tremayne (2006) extend this work to more general INAR models and investigate bootstrap-based methods for coherent forecasting under estimation uncertainty.

We conclude this chapter with a brief remark about how to simulate a stationary INAR(1) process.

Remark 2.6.4 (Simulation of INAR(1) process)

Since an INAR(1) process constitutes a Markov chain, the essential point for simulating a stationary INAR(1) process is its correct initialization. Because of the Markov property discussed in Appendix B.2.1, we have to ensure that the initial count stems from the stationary marginal distribution; if the remaining counts are then generated by using the one-step-ahead conditional distributions (2.5) – that is, by implementing the model recursion from Definition 2.1.1.1 – the whole process becomes stationary.

So how can we simulate the initial count according to the stationary marginal distribution? For the Poisson INAR(1) model, the stationary marginal distribution is explicitly known, being a simple Poisson distribution (Section 2.1.2). So we just use this Poisson distribution for the initial count, and the conditional distributions for the remaining counts.

If the stationary marginal distribution is not explicitly available, two approximate solutions are possible. First, one can compute the MC approximation $c02-math-524$ with sufficiently large $c02-math-525$ , as described in Remark 2.1.3.4; also see Remark 2.6.3 for an alternative approach. Then the initial count is generated according to the distribution $c02-math-526$ with finite support $c02-math-527$ . Secondly, one may utilize the ergodicity of the INAR(1) process (Remark 2.1.1.2). The idea is to generate a prerun $c02-math-528$ , say by initializing $c02-math-529$ and by then generating $c02-math-530$ from the one-step-ahead conditional distributions. Their corresponding marginal distributions then converge towards the stationary marginal distribution; see the discussion in Appendix B.2.2. If the length $c02-math-531$ of the prerun is sufficiently large – the values for $c02-math-532$ reported in the literature typically vary between 200 and 500 – then the distribution of $c02-math-533$ is close to the required stationary marginal distribution.

The approach described in Remark 2.6.4 is easily adapted to other types of count processes, for example higher-order Markov processes, by considering the multivariate representation described after Definition B.1.7.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.