Covariance functions and kernels

In practice, covariance matrices are specified using functions known as kernels. You may find more than one definition of kernel in the statistical literature, with slightly different mathematical properties. For the purpose of our discussion, we are going to say that a kernel is basically a symmetric function that takes two inputs and returns a value of zero in the inputs are the same or positive otherwise. If these conditions are met, we can interpret the output of a kernel function as a measure of similarity between the two inputs.

Among the many useful kernels available, a popular one used is the exponentiated quadratic kernel:

Here, is the squared Euclidean distance:

It may not be obvious at first sight, but the exponentiated quadratic kernel has a similar formula as the Gaussian distribution (see expression 1.3). For this reason, you may find people referring to this kernel as the Gaussian kernel. The term is known as the length-scale (or bandwidth or variance) and controls the width of the kernel.

To better understand the role of kernels, let's define a Python function to compute the exponentiated quadratic kernel:

def exp_quad_kernel(x, knots, ℓ=1):
    """exponentiated quadratic kernel"""
    return np.array([np.exp(-(x-k)**2 / (2*ℓ**2)) for k in knots])

The following code and Figure 7.2 are designed to show how a covariance matrix looks appears for different inputs. The input I choose is rather simple and consists of the values [-1, 0, 1, 2]. Once you understand this example, you should try it with other inputs (see exercise 1):

data = np.array([-1, 0, 1, 2])
cov = exp_quad_kernel(data, data, 1)

_, ax = plt.subplots(1, 2, figsize=(12, 5))
ax = np.ravel(ax)

ax[0].plot(data, np.zeros_like(data), 'ko')
ax[0].set_yticks([])
for idx, i in enumerate(data):
    ax[0].text(i, 0+0.005, idx)
ax[0].set_xticks(data)
ax[0].set_xticklabels(np.round(data, 2))
#ax[0].set_xticklabels(np.round(data, 2), rotation=70)

ax[1].grid(False)
im = ax[1].imshow(cov)
colors = ['w', 'k']
for i in range(len(cov)):
    for j in range(len(cov)):
        ax[1].text(j, i, round(cov[i, j], 2),
                   color=colors[int(im.norm(cov[i, j]) > 0.5)],
                   ha='center', va='center', fontdict={'size': 16})
ax[1].set_xticks(range(len(data)))
ax[1].set_yticks(range(len(data)))
ax[1].xaxis.tick_top()

Figure 7.2

The panel on the left of Figure 7.2 shows the input, the values on the x axis represent the values of each data point, and the text annotations show the order of the data points (starting from zero). On the right-hand panel, we have a heatmap representing the covariance matrix obtained using the exponentiated quadratic kernel. The lighter color means a larger covariance. As you can see, the heatmap is symmetric, with the diagonal taking the larger values. The value of each element in the covariance matrix is inversely proportional to the distance between the points, as the diagonal is the result of comparing each data point with itself. We get the closest distance, 0, and higher covariance values, 1, for this kernel. Other values are possible for other kernels.

The kernel is translating the distance of the data points along the x axis to values of covariances for values of the expected function (on the y axis). Thus, the closest two points are on the x axis; the most similar we expect their values to be on the y axis.

In summary, we have seen so far that we can use multivariate normal distributions with a given covariance to model functions. And we can use kernel functions to build the covariances. In the following example, we use the exp_quad_kernel function to define the covariance matrix of a multivariate normal, and then use samples from that distribution to represent functions:

np.random.seed(24)
test_points = np.linspace(0, 10, 200)
fig, ax = plt.subplots(2, 2, figsize=(12, 6), sharex=True,
                       sharey=True, constrained_layout=True)
ax = np.ravel(ax)

for idx, ℓ in enumerate((0.2, 1, 2, 10)):
    cov = exp_quad_kernel(test_points, test_points, ℓ)
    ax[idx].plot(test_points, stats.multivariate_normal.rvs(cov=cov, size=2).T)
    ax[idx].set_title(f'ℓ ={ℓ}')
fig.text(0.51, -0.03, 'x', fontsize=16)
fig.text(-0.03, 0.5, 'f(x)', fontsize=16)

Figure 7.3

As you can see in Figure 7.3, a Gaussian kernel implies a wide variety of functions with the parameter controlling the smoothness of the functions. The larger the value of , the smoother the function.

Table of Contents for Covariance functions and kernels

Create new playlist

Sign In

Sign Up

Table of Contents for
Covariance functions and kernels