Hierarchical linear regression

In the previous chapter, we learned about the rudiments of hierarchical models. We can apply this concept to linear regression as well. This allows models to deal with inferences at the group level and estimations above the group level. As we already saw, this is done by including hyperpriors.

We are going to create eight related data groups, including one group with a single data point:

N = 20
M = 8
idx = np.repeat(range(M-1), N)
idx = np.append(idx, 7)
np.random.seed(314)

alpha_real = np.random.normal(2.5, 0.5, size=M)
beta_real = np.random.beta(6, 1, size=M)
eps_real = np.random.normal(0, 0.5, size=len(idx))

y_m = np.zeros(len(idx))
x_m = np.random.normal(10, 1, len(idx))
y_m = alpha_real[idx] + beta_real[idx] * x_m + eps_real

_, ax = plt.subplots(2, 4, figsize=(10, 5), sharex=True, sharey=True)
ax = np.ravel(ax)
j, k = 0, N
for i in range(M):
    ax[i].scatter(x_m[j:k], y_m[j:k])
    ax[i].set_xlabel('x_{i}')
    ax[i].set_ylabel('y_{i}', rotation=0, labelpad=15)
    ax[i].set_xlim(6, 15)
    ax[i].set_ylim(7, 17)


    j += N
    k += N
plt.tight_layout()

Figure 3.13

Now, we are going to center the data before feeding it to the model:

x_centered = x_m - x_m.mean()

First we are going to fit a non-hierarchical model, just as we have already seen. The only difference is that we are now going to include code to re-scale to the original scale:

with pm.Model() as unpooled_model:
    α_tmp = pm.Normal('α_tmp', mu=0, sd=10, shape=M)
    β = pm.Normal('β', mu=0, sd=10, shape=M)
    ϵ = pm.HalfCauchy('ϵ', 5)
    ν = pm.Exponential('ν', 1/30)

    y_pred = pm.StudentT('y_pred', mu=α_tmp[idx] + β[idx] * x_centered,
                         sd=ϵ, nu=ν, observed=y_m)

    α = pm.Deterministic('α', α_tmp - β * x_m.mean())

    trace_up = pm.sample(2000)

As we can see in Figure 3.14, the estimations for the and parameters are very very wide compared to the rest of the and parameters:

az.plot_forest(trace_up, var_names=['α', 'β'], combined=True)

Figure 3.14

You may have already guessed what is going on—it does not make sense to try to fit a line through a single point. We need at least two points, otherwise the parameters and are unbounded. That is totally true unless we provide some more information; we can do this by using priors. Putting a strong prior for can lead to a well-defined set of lines, even for one data point. Another way to convey information is by defining hierarchical models, since hierarchical models allow information to be shared between groups, shrinking the plausible values of the estimated parameters. This becomes very useful in cases where we have groups with sparse data. In this example, we have taken that sparsity of the data to the extreme—a group with a single data point!

Now, we are going to implement a hierarchical model that is the same as a regular linear regression model but with hyperpriors, as you can see in the following Kruschke diagram:

Figure 3.15

In the following PyMC3 model, the main differences compared to previous models are:

We add hyperpriors.
We also add a few lines to transform the parameters back to the original un-centered scale. Remember that this is not mandatory; we can keep the parameters in the transformed scale as long as we keep that in mind when interpreting the results:

with pm.Model() as hierarchical_model:
    # hyper-priors
    α_μ_tmp = pm.Normal('α_μ_tmp', mu=0, sd=10)
    α_σ_tmp = pm.HalfNormal('α_σ_tmp', 10)
    β_μ = pm.Normal('β_μ', mu=0, sd=10)
    β_σ = pm.HalfNormal('β_σ', sd=10)

    # priors
    α_tmp = pm.Normal('α_tmp', mu=α_μ_tmp, sd=α_σ_tmp, shape=M)
    β = pm.Normal('β', mu=β_μ, sd=β_σ, shape=M)
    ϵ = pm.HalfCauchy('ϵ', 5)
    ν = pm.Exponential('ν', 1/30)

    y_pred = pm.StudentT('y_pred',
                         mu=α_tmp[idx] + β[idx] * x_centered,
                         sd=ϵ, nu=ν, observed=y_m)

    α = pm.Deterministic('α', α_tmp - β * x_m.mean())
    α_μ = pm.Deterministic('α_μ', α_μ_tmp - β_μ * x_m.mean())
    α_σ = pm.Deterministic('α_sd', α_σ_tmp - β_μ * x_m.mean())

    trace_hm = pm.sample(1000)

To compare the results of unpooled_model with hierarhical_model, we are going to do one more forest plot:

az.plot_forest(trace_hm, var_names=['α', 'β'], combined=True)

Figure 3.16

A good way to compare models using az.plot_forest() is to show the parameters of both models (unpooled_model, hierarhical_model) simultaneously in the same plot. To do this, you just need to pass a list of traces.

To better understand what the model is capturing about the data, let's plot the fitted lines for each of the eight groups:

_, ax = plt.subplots(2, 4, figsize=(10, 5), sharex=True, sharey=True,
                     constrained_layout=True)
ax = np.ravel(ax)
j, k = 0, N
x_range = np.linspace(x_m.min(), x_m.max(), 10)
for i in range(M):
    ax[i].scatter(x_m[j:k], y_m[j:k])
    ax[i].set_xlabel(f'x_{i}')
    ax[i].set_ylabel(f'y_{i}', labelpad=17, rotation=0)
    alpha_m = trace_hm['α'][:, i].mean()
    beta_m = trace_hm['β'][:, i].mean()
    ax[i].plot(x_range, alpha_m + beta_m * x_range, c='k',
               label=f'y = {alpha_m:.2f} + {beta_m:.2f} * x')
    plt.xlim(x_m.min()-1, x_m.max()+1)
    plt.ylim(y_m.min()-1, y_m.max()+1)
    j += N
    k += N

Figure 3.17

Using a hierarchical model, we were able to fit a line to a single data point, as you can see in Figure 3.17. At first, this may sound weird or even fishy, but this is just a consequence of the structure of the hierarchical model. Each line is informed by the lines of the other groups, thus we are not truly adjusting a line to a single point. Instead, we are adjusting a line to a single point that's been informed by the points in the other groups.

Table of Contents for Hierarchical linear regression

Create new playlist

Sign In

Sign Up

Table of Contents for
Hierarchical linear regression