Appendix A

Examples of Count Distributions

This appendix provides a brief survey of common models for count data and their stochastic properties. In particular, we discuss dispersion properties and types of generating functions, the latter being useful tools when analyzing count models. The presented models are divided into univariate and multivariate models, and into models for an infinite range like c0A-math-001 or a finite range like c0A-math-002.

A.1 Count Models for an Infinite Range

The most famous model for a count random variable having the range c0A-math-003 (full set of non-negative integers) is the Poisson distribution.

The equidispersion property of the Poisson distribution can be stated equivalently in terms of the index of dispersion c0A-math-012 according to (2.1), by saying that the Poisson distribution always satisfies c0A-math-013 (a related approach focussing on the probability for observing a zero, based on the zero index (2.2), is considered below). Values for c0A-math-014 deviating from 1, in turn, express a violation of the Poisson model.

Note that in the literature, there are several other names for the compound Poisson distribution: for example, it is sometimes referred to as the Poisson-stopped sum distribution because the summation c0A-math-047 is stopped by the Poisson random variable c0A-math-048.

The family of c0A-math-049-distributions includes a number of important special cases. Besides c0A-math-050, the distributions of Examples A.1.3–A.1.6 also belong to the CP-family.

The negative binomial distribution belongs to the c0A-math-061-family.

All distributions in Examples A.1.2–A.1.6 are overdispersed. The opposite phenomenon, c0A-math-094 – that is, a variance smaller than the mean – is referred to as underdispersion. The c0A-math-095-distribution of Example A.1.6 might be extended to also cover underdispersion, by allowing c0A-math-096 to be negative. But then it is necessary to truncate the range of the GP-distribution, where the degree of truncation depends on the actual values of the model parameters. The truncation of the range also causes the problem that the probabilities according to the above pmf no longer sum to 1, and otherwise exact properties become only approximate (Johnson et al., 2005, p. 336). An analogous criticism – that is, that essential properties of the distribution are known only approximately – also applies to the families of Efron's double Poisson (DP) distributions (Johnson et al., 2005, Section 11.1.8) and of Conway–Maxwell (COM) Poisson distributions (Shmueli et al., 2005).

Therefore, in Weiß (2013a), two other distribution families are recommended for underdispersed counts: the Good distribution and the power-law weighted Poisson (PL) distribution.

Another characteristic property of the Poisson distribution (Example A.1.1) is the probability of observing a zero: the zero index c0A-math-114 according to (2.2) equals 0 for the Poisson distribution, but may differ otherwise. For a compound Poisson distribution c0A-math-115, for instance, we obtain (Example A.1.2)

which equals 0 iff c0A-math-117, and which satisfies c0A-math-118 otherwise (zero inflation; note that c0A-math-119 is the mean of the compounding distribution). A more flexible approach towards zero modification is summarized in the following example.

If the parent distribution in Example A.1.9 is the Poisson distribution c0A-math-127, then the distribution obtained for c0A-math-128 is said to be the zero-inflated Poisson distribution, c0A-math-129.

We conclude this section with a brief look at a distribution for a slightly different type of infinite and integer-valued range: the full range of integers c0A-math-130.

A.2 Count Models for a Finite Range

The distributions surveyed up until now all have unlimited ranges. In some applications, however, it is known in advance that an upper bound c0A-math-145 exists; this can never be exceeded. Then the distributions discussed in the sequel become relevant. Thse have a finite range of the form c0A-math-146.

We have seen that in a Poisson sense, the binomial distribution is underdispersed. However, since we are concerned with a different type of random phenomenon anyway – one with a finite range – it is more appropriate to evaluate the dispersion behavior in terms of the binomial index of dispersion c0A-math-168 according to (2.3), with the binomial distribution satisfying c0A-math-169.

A distribution with c0A-math-170 shows “overdispersion with respect to a binomial model”, a phenomenon that we refer to as extra-binomial variation.

Another distribution with extra-binomial variation is the Markov binomial distribution, which is briefly discussed in the context of Equations 9.1 and 9.2 .

The probability of observing a zero equals c0A-math-185 in the binomial case. Increasing this probability according to the approach in Example A.1.9, with c0A-math-186, leads to the zero-inflated binomial distribution, c0A-math-187. However, the c0A-math-188 distribution according to Example A.2.2 is also easily seen to be zero-inflated with respect to a binomial distribution, since the zero probability is obtained as

equation

A.3 Multivariate Count Models

Although this book is primarily concerned with univariate count data, a few models for multivariate count data are sketched here. Much more information can be found in the book by Johnson et al. (1997). For the particular case of bivariate distributions, the book by Kocherlakota & Kocherlakota (1992) is also recommended.

The approach of Example A.3.1 for constructing a c0A-math-205-variate distribution could be applied to other univariate additive count models as well.

Extending the binomial distribution (Example A.2.1), with its finite range, is more demanding. The most famous multivariate binomial distribution is the multinomial distribution.

The use of the multinomial distribution as a multivariate extension of the binomial distribution is limited by the fact that each binomial component c0A-math-240 has the same population size c0A-math-241, and that the sum of the components has to be equal to this value c0A-math-242. The importance of the multinomial distribution is in the fact that the binary random vectors c0A-math-243 can be understood as a binarization of a categorical random variable c0A-math-244 with range c0A-math-245 by defining c0A-math-246 if c0A-math-247. Then c0A-math-248 represents the realized absolute frequencies of c0A-math-249 independent replications of c0A-math-250, and c0A-math-251 gives the corresponding relative frequencies (proportions).

Another type of generalized multinomial distribution is the Markov multinomial distribution, which is briefly discussed in Example 9.1.2.3 .

More details on these and other multivariate discrete distributions are provided by Johnson et al. (1997) and Kocherlakota & Kocherlakota (1992).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.135.249