The zero-inflated Poisson model

When counting things, one option is to not count a thing, that is, to get a zero. The number zero can generally occur for many reasons; we get a zero because we were counting red cars and a red car did not pass through the avenue or because we missed it (maybe we did not see that tiny red car behind that large truck). So if we use a Poisson distribution, we will notice, for example, when performing a posterior predictive check, that the model generates fewer zeros compared to the data. How do we fix that? We may try to address the exact cause of our model predicting fewer zeros than observed and include that factor in the model. But, as is often the case, it is enough and easier for our purposes just to assume that we have a mixture of two processes:

One modeled by a Poisson distribution with probability
One giving extra zeros with probability

This is known as the zero-inflated Poisson (ZIP) model. In some texts, you will find that represents the extra zeros and the probability of the Poisson. This is not a big deal; just pay attention to which is which for a concrete example.

Basically, a ZIP distribution is:

Where is the probability of extra zeros.

To exemplify the use of the ZIP distribution, let's create a few synthetic data points:

n = 100
θ_real = 2.5
ψ = 0.1

# Simulate some data
counts = np.array([(np.random.random() > (1-ψ)) * 
                   np.random.poisson(θ_real) for i in range(n)])

We could easily implement equations 4.24 and 4.25 into a PyMC3 model. However, we can do something even easier: we can use the built-in ZIP distribution from PyMC3:

with pm.Model() as ZIP:
    ψ = pm.Beta('ψ', 1, 1)
    θ = pm.Gamma('θ', 2, 0.1)
    y = pm.ZeroInflatedPoisson('y', ψ, θ,
                               observed=counts)
    trace = pm.sample(1000)

Figure 4.11

Table of Contents for The zero-inflated Poisson model

Create new playlist

Sign In

Sign Up

Table of Contents for
The zero-inflated Poisson model