This appendix provides a brief survey of common models for count data and their stochastic properties. In particular, we discuss dispersion properties and types of generating functions, the latter being useful tools when analyzing count models. The presented models are divided into univariate and multivariate models, and into models for an infinite range like or a finite range like .
The most famous model for a count random variable having the range (full set of non-negative integers) is the Poisson distribution.
The equidispersion property of the Poisson distribution can be stated equivalently in terms of the index of dispersion according to (2.1), by saying that the Poisson distribution always satisfies (a related approach focussing on the probability for observing a zero, based on the zero index (2.2), is considered below). Values for deviating from 1, in turn, express a violation of the Poisson model.
Note that in the literature, there are several other names for the compound Poisson distribution: for example, it is sometimes referred to as the Poisson-stopped sum distribution because the summation is stopped by the Poisson random variable .
The family of -distributions includes a number of important special cases. Besides , the distributions of Examples A.1.3–A.1.6 also belong to the CP-family.
The negative binomial distribution belongs to the -family.
All distributions in Examples A.1.2–A.1.6 are overdispersed. The opposite phenomenon, – that is, a variance smaller than the mean – is referred to as underdispersion. The -distribution of Example A.1.6 might be extended to also cover underdispersion, by allowing to be negative. But then it is necessary to truncate the range of the GP-distribution, where the degree of truncation depends on the actual values of the model parameters. The truncation of the range also causes the problem that the probabilities according to the above pmf no longer sum to 1, and otherwise exact properties become only approximate (Johnson et al., 2005, p. 336). An analogous criticism – that is, that essential properties of the distribution are known only approximately – also applies to the families of Efron's double Poisson (DP) distributions (Johnson et al., 2005, Section 11.1.8) and of Conway–Maxwell (COM) Poisson distributions (Shmueli et al., 2005).
Therefore, in Weiß (2013a), two other distribution families are recommended for underdispersed counts: the Good distribution and the power-law weighted Poisson (PL) distribution.
Another characteristic property of the Poisson distribution (Example A.1.1) is the probability of observing a zero: the zero index according to (2.2) equals 0 for the Poisson distribution, but may differ otherwise. For a compound Poisson distribution , for instance, we obtain (Example A.1.2)
which equals 0 iff , and which satisfies otherwise (zero inflation; note that is the mean of the compounding distribution). A more flexible approach towards zero modification is summarized in the following example.
If the parent distribution in Example A.1.9 is the Poisson distribution , then the distribution obtained for is said to be the zero-inflated Poisson distribution, .
We conclude this section with a brief look at a distribution for a slightly different type of infinite and integer-valued range: the full range of integers .
The distributions surveyed up until now all have unlimited ranges. In some applications, however, it is known in advance that an upper bound exists; this can never be exceeded. Then the distributions discussed in the sequel become relevant. Thse have a finite range of the form .
We have seen that in a Poisson sense, the binomial distribution is underdispersed. However, since we are concerned with a different type of random phenomenon anyway – one with a finite range – it is more appropriate to evaluate the dispersion behavior in terms of the binomial index of dispersion according to (2.3), with the binomial distribution satisfying .
A distribution with shows “overdispersion with respect to a binomial model”, a phenomenon that we refer to as extra-binomial variation.
Another distribution with extra-binomial variation is the Markov binomial distribution, which is briefly discussed in the context of Equations 9.1 and 9.2 .
The probability of observing a zero equals in the binomial case. Increasing this probability according to the approach in Example A.1.9, with , leads to the zero-inflated binomial distribution, . However, the distribution according to Example A.2.2 is also easily seen to be zero-inflated with respect to a binomial distribution, since the zero probability is obtained as
Although this book is primarily concerned with univariate count data, a few models for multivariate count data are sketched here. Much more information can be found in the book by Johnson et al. (1997). For the particular case of bivariate distributions, the book by Kocherlakota & Kocherlakota (1992) is also recommended.
The approach of Example A.3.1 for constructing a -variate distribution could be applied to other univariate additive count models as well.
Extending the binomial distribution (Example A.2.1), with its finite range, is more demanding. The most famous multivariate binomial distribution is the multinomial distribution.
The use of the multinomial distribution as a multivariate extension of the binomial distribution is limited by the fact that each binomial component has the same population size , and that the sum of the components has to be equal to this value . The importance of the multinomial distribution is in the fact that the binary random vectors can be understood as a binarization of a categorical random variable with range by defining if . Then represents the realized absolute frequencies of independent replications of , and gives the corresponding relative frequencies (proportions).
Another type of generalized multinomial distribution is the Markov multinomial distribution, which is briefly discussed in Example 9.1.2.3 .
More details on these and other multivariate discrete distributions are provided by Johnson et al. (1997) and Kocherlakota & Kocherlakota (1992).
3.142.135.249