Convolutions

Even if we work only with finite and discrete convolutions, it's useful to start providing the standard definition based on integrable functions. For simplicity, let's suppose that f(τ) and k(τ) are two real functions of a single variable defined in ℜ. The convolution of f(τ) and k(τ) (conventionally denoted as f ∗ k), which we are going to call kernel, is defined as follows:

The expression may not be very easy to understand without a mathematical background, but it can become exceptionally simple with a few considerations. First of all, the integral sums over all values of τ; therefore, the convolution is a function of the remaining variable, t. The second fundamental element is a sort of dynamic property: the kernel is reversed (-τ) and transformed into a function of a new variable z = t - τ. Without deep mathematical knowledge, it's possible to understand that this operation shifts the function along the τ (independent variable) axis. In the following graphs, there's an example based on a parabola:

The first diagram is the original kernel (which is also symmetric). The other two plots show, respectively, a forward and a backward shift. It should be clearer now that a convolution multiplies the function f(τ) times the shifted kernel and computes the area under the resulting curve. As the variable t is not integrated, the area is a function of t and defines a new function, which is the convolution itself. In other words, the value of convolution of f(τ) and k(τ) computed for t = 5 is the area under the curve obtained by the multiplication f(τ)k(5 - τ). By definition, a convolution is commutative (f ∗ k = k ∗ f) and distributive (f ∗ (k + g) = (f ∗ k) + (f ∗ g)). Moreover, it's also possible to prove that it's associative (f ∗ (k ∗ g) = (f ∗ k) ∗ g).

However, in deep learning, we never work with continuous convolutions; therefore, I omit all the properties and mathematical details, focusing the attention on the discrete case. The reader who is interested in the theory can find further details in Circuits, Signals, and Systems, Siebert W. M., MIT Press. A common practice is, instead, to stack multiple convolutions with different kernels (often called filters), to transform an input containing n channels into an output with m channels, where m corresponds to the number of kernels. This approach allows the unleashing of the full power of convolutions, thanks to the synergic actions of different outputs. Conventionally, the output of a convolution layer with n filters is called a feature map (w^(t) × h^(t) × n), because its structure is no longer related to a specific image but resembles the overlap of different feature detectors. In this chapter, we often talk about images (considering a hypothetical first layer), but all the considerations are implicitly extended to any feature map.

Table of Contents for Convolutions

Create new playlist

Sign In

Sign Up

Table of Contents for
Convolutions