Hierarchical models

Suppose we want to analyze the quality of water in a city, so we take samples by dividing the city into neighborhoods. We may think we have two options to analyze this data:

Study each neighborhood as a separated entity
Pool all the data together and estimate the water quality of the city as a single big group

Both options could be reasonable, depending on what we want to know. We can justify the first option by saying we obtain a more detailed view of the problem, which otherwise could become invisible or less evident if we average the data. The second option can be justified by saying that if we pool the data, we obtain a bigger sample size and hence a more accurate estimation. Both are good reasons, but we can do something else, something in-between. We can build a model to estimate the water quality of each neighborhood and, at the same time, estimate the quality of the whole city. This type of model is known as a hierarchical model or multilevel model, because we model the data using a hierarchical structure or one with multiple levels.

So, how do we build a hierarchical model? Well, in a nutshell, instead of fixing the parameters of our priors to some constant numbers, we estimate them directly from the data by placing shared priors over them. These higher-level priors are often called hyper-priors, and their parameters hyperparameters; hyper means over in Greek. Of course, is also possible to put priors over the hyper-priors and create as many levels as we want; the problem is that the model rapidly becomes difficult to understand and unless the problem really demands more structure, adding more levels than necessary does not help to make better inferences. On the contrary, we end up entangled in a web of hyper-priors and hyperparameters without the ability to assign any meaningful interpretation to them, partially spoiling the advantages of model-based statistics. After all, the main idea of building models is to make sense of data, and thus models should reflect (and take advantage of) the structure in the data.

To illustrate the main concepts of hierarchical models, we are going to use a toy model of the water quality example we discussed at the beginning of this section and we are going to use synthetic data. Imagine we have collected water samples from three different regions of the same city and we have measured the lead content of water; samples with lead concentration above recommendations from the World Health Organization (WHO) are marked with zero and samples with values below the recommendations are marked with one. This is just a pedagogic example; in a more realistic example, we would have a continuous measurement of lead concentration and probably many more groups. Nevertheless, for our current purposes, this example is good enough to uncover the details of hierarchical models.

We can generate the synthetic data with the following code:

N_samples = [30, 30, 30]
G_samples = [18, 18, 18]

group_idx = np.repeat(np.arange(len(N_samples)), N_samples)
data = []
for i in range(0, len(N_samples)):
    data.extend(np.repeat([1, 0], [G_samples[i], N_samples[i]-G_samples[i]]))

We are simulating an experiment where we have measured three groups, each one consisting of a certain number of samples; we store the total number of samples per group in the N_samples list. Using the G_samples list, we keep a record of the number of good quality samples per group. The rest of the code is there just to generate a list of the data, filled with zeros and ones.

The model is essentially the same one we used for the coin problem, except for two important features:

We have have defined two hyper-priors that will influence the beta prior.
Instead of putting hyper-priors on the parameters and , we are indirectly defining them in terms of , the mean of the beta distribution, and , the precision (the concentration) of the beta distribution. The precision is analog to the inverse of the standard deviation; the larger the value of , the more concentrated the beta distribution will be:

Notice we are using the subindex to indicate that the model has groups with different values for some of the parameters. That is, not all parameters are shared between the groups. Using Kruschke diagrams, it is evident that this new model has one additional level compared to all of the models we have seen so far:

Figure 2.19

Notice how in Figure 2.19 we have used the symbol = instead of ∼ to define and , once we know and the values of and becomes fully determined. Accordingly, we call this type of variable deterministic in opposition to stochastic variables such as , , or .

Let's talk a little bit about parameterization. Using the mean and precision is mathematically equivalent to using the and parameterization, implying we should get the same results. So, why are we taking this detour instead of the more direct route? There are two reasons for this:

First, the mean and precision parameterization, while mathematically equivalent, is numerically better suited for the sampler, and thus we are more confident of the results PyMC3 is returning, we will learn about the reason for and intuition behind this statement in Chapter 8, Inference Engines.
The second reason is pedagogical. This is a concrete example showing that there is potentially more than one way to express a model. Mathematically equivalent implementations could have nevertheless practical differences; the efficiency of the sampler is one aspect to consider, while another one could be the interpretability of the model. For some specific problems or particular audiences, it could be a better choice to report the mean and precision of the beta distribution than the and parameters.

Let's implement and solve the model in PyMC3 so that we can keep discussing hierarchical models:

with pm.Model() as model_h:
    μ = pm.Beta('μ', 1., 1.)
    κ = pm.HalfNormal('κ', 10)

    θ = pm.Beta('θ', alpha=μ*κ, beta=(1.0-μ)*κ, shape=len(N_samples))
    y = pm.Bernoulli('y', p=θ[group_idx], observed=data)

    trace_h = pm.sample(2000)
az.plot_trace(trace_h)

Figure 2.20

Table of Contents for Hierarchical models

Create new playlist

Sign In

Sign Up

Table of Contents for
Hierarchical models