Gaussian process regression

Let's assume we can model a value as a function of plus some noise:

Here

This is similar to the assumption that we made in Chapter 3, Modeling with Linear Regression, for linear regression models. The main difference is that now we will put a prior distribution over . Gaussian processes can work as such a prior, thus we can write:

Here, represents a Gaussian process distribution, with being the mean function and the kernel, or covariance, function. Here, we have used the word function to indicate that, mathematically, the mean and covariance are infinite objects, even when, in practice, we always work with finite objects.

If the prior distribution is a GP and the likelihood is a normal distribution, then the posterior is also a GP and we can compute it analytically:

Here:

is the observed data point and represents the test points; that is, the new points where we want to know the value of the inferred function.

As usual, PyMC3 allows us to perform inference by taking care of almost all the mathematical details for us, and this is not the exception with Gaussian processes. So, let's proceed to create some data and then a PyMC3 model:

np.random.seed(42)
x = np.random.uniform(0, 10, size=15)
y = np.random.normal(np.sin(x), 0.1)
plt.plot(x, y, 'o')
true_x = np.linspace(0, 10, 200)
plt.plot(true_x, np.sin(true_x), 'k--')
plt.xlabel('x')
plt.ylabel('f(x)', rotation=0)

Figure 7.4

In Figure 7.4, we see the true unknown function as a dashed black line, while the dots represent samples (with noise) from the unknown function.

Notice that in order to code equations 7.7 and 7.8 into a PyMC3 model, we only need to find out the parameter , the variance of the normal likelihood, , and the length-scale parameter of the exponentiated quadratic kernel.

GPs are implemented in PyMC3 as a series of Python classes that deviate a little bit from what we have seen in previous models; nevertheless, the code is still very PyMC3onic. I have added a few comments in the following code to guide you through the key steps of defining a GP with PyMC3:

# A one dimensional column vector of inputs.
X = x[:, None]

with pm.Model() as model_reg:
    # hyperprior for lengthscale kernel parameter
    ℓ = pm.Gamma('ℓ', 2, 0.5)
    # instantiate a covariance function
    cov = pm.gp.cov.ExpQuad(1, ls=ℓ)
    # instantiate a GP prior
    gp = pm.gp.Marginal(cov_func=cov)
    # prior
    ϵ = pm.HalfNormal('ϵ', 25)
    # likelihood
    y_pred = gp.marginal_likelihood('y_pred', X=X, y=y, noise=ϵ)

Notice that instead of a normal likelihood, as expected from expression 7.7, we have used the gp.marginal_likelihood method. As you may remember from Chapter 1, Thinking Probabilistically, (equation 1.1) and from Chapter 5, Model Comparison, (equation 5.13), the marginal likelihood is the integral of the likelihood and the prior:

As usual, represent all the unknown parameters, is the independent variables, and is the dependent variables. Notice that we are marginalizing over the value of the function . For a GP prior and a normal likelihood, the marginalization can be performed analytically.

According to Bill Engels, core developer of PyMC3, and the main contributor to the GP module, Often, for length-scale parameters, priors avoiding zero work better. A useful default prior for is pm.Gamma(2, 0.5). You can read more advice about default useful priors from the Stan team: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations:

az.plot_trace(trace_reg)

Figure 7.5

Now that we have found the values of and , we may want to get samples from the GP posterior; that is, samples of the functions fitting the data. We can do this by computing the conditional distribution evaluated over new input locations using the gp.conditional function:

X_new = np.linspace(np.floor(x.min()), np.ceil(x.max()), 100)[:,None]

with model_reg:
    f_pred = gp.conditional('f_pred', X_new)

As a result, we get a new PyMC3 random variable, f_pred, that we can use to get samples from the posterior predictive distribution (evaluated at the X_new values):

with model_reg:
    pred_samples = pm.sample_posterior_predictive(trace_reg, vars=[f_pred], samples=82)

And now we can plot the fitted functions over the original data, to visually inspect how well they fit the data and the associated uncertainty in our predictions:

_, ax = plt.subplots(figsize=(12,5))
ax.plot(X_new, pred_samples['f_pred'].T, 'C1-', alpha=0.3)
ax.plot(X, y, 'ko')
ax.set_xlabel('X')

Figure 7.6

Alternatively, we can use the pm.gp.util.plot_gp_dist function to get some nice plots. Each plot represents a percentile ranging from 51 (lighter color) to 99 (darker color):

_, ax = plt.subplots(figsize=(12,5))

pm.gp.util.plot_gp_dist(ax, pred_samples['f_pred'], X_new, palette='viridis', plot_samples=False);

ax.plot(X, y, 'ko')
ax.set_xlabel('x')
ax.set_ylabel('f(x)', rotation=0, labelpad=15)

Figure 7.7

Yet another alternative is to compute the mean vector and standard deviation of the conditional distribution evaluated at a given point in the parameter space. In the following example, we use the mean (over the samples in the trace) for and . We can compute the mean and variance using the gp.predict function. We can do this because PyMC3 has computed the posterior analytically:

_, ax = plt.subplots(figsize=(12,5))

point = {'ℓ': trace_reg['ℓ'].mean(), 'ϵ': trace_reg['ϵ'].mean()}
mu, var = gp.predict(X_new, point=point, diag=True)
sd = var**0.5

ax.plot(X_new, mu, 'C1')
ax.fill_between(X_new.flatten(),
                 mu - sd, mu + sd,
                 color="C1",
                 alpha=0.3)

ax.fill_between(X_new.flatten(),
                 mu - 2*sd, mu + 2*sd,
                 color="C1",
                 alpha=0.3)

ax.plot(X, y, 'ko')
ax.set_xlabel('X')

Figure 7.8

As we saw in Chapter 4, Generalizing Linear Models, we can use a linear model with a non-Gaussian likelihood and a proper inverse link function to extend the range of linear models. We can do the same for GPs. We can, for example, use a Poisson likelihood with an exponential inverse link function. For a model like this, the posterior is no longer analytically tractable, but, nevertheless, we can use numerical methods to approximate it. In the following sections, we will discuss this type of model.

Table of Contents for Gaussian process regression

Create new playlist

Sign In

Sign Up

Table of Contents for
Gaussian process regression