The zero-inflated Poisson model

When counting things, one option is to not count a thing, that is, to get a zero. The number zero can generally occur for many reasons; we get a zero because we were counting red cars and a red car did not pass through the avenue or because we missed it (maybe we did not see that tiny red car behind that large truck). So if we use a Poisson distribution, we will notice, for example, when performing a posterior predictive check, that the model generates fewer zeros compared to the data. How do we fix that? We may try to address the exact cause of our model predicting fewer zeros than observed and include that factor in the model. But, as is often the case, it is enough and easier for our purposes just to assume that we have a mixture of two processes:

  • One modeled by a Poisson distribution with probability  
  • One giving extra zeros with probability 

This is known as the zero-inflated Poisson (ZIP) model. In some texts, you will find that  represents the extra zeros and  the probability of the Poisson. This is not a big deal; just pay attention to which is which for a concrete example.

Basically, a ZIP distribution is:


Where  is the probability of extra zeros.

To exemplify the use of the ZIP distribution, let's create a few synthetic data points:

n = 100
θ_real = 2.5
ψ = 0.1

# Simulate some data
counts = np.array([(np.random.random() > (1-ψ)) *
np.random.poisson(θ_real) for i in range(n)])

We could easily implement equations 4.24 and 4.25 into a PyMC3 model. However, we can do something even easier: we can use the built-in ZIP distribution from PyMC3:

with pm.Model() as ZIP:
ψ = pm.Beta('ψ', 1, 1)
θ = pm.Gamma('θ', 2, 0.1)
y = pm.ZeroInflatedPoisson('y', ψ, θ,
observed=counts)
trace = pm.sample(1000)

Figure 4.11
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.45.137