Hebb's rule

Hebb's rule has been proposed as a conjecture in 1949 by the Canadian psychologist Donald Hebb to describe the synaptic plasticity of natural neurons. A few years after its publication, this rule was confirmed by neurophysiological studies, and many research studies have shown its validity in many application, of Artificial Intelligence. Before introducing the rule, it's useful to describe the generic Hebbian neuron, as shown in the following diagram:

Generic Hebbian neuron with a vectorial input

The neuron is a simple computational unit that receives an input vector x, from the pre-synaptic units (other neurons or perceptive systems) and outputs a single scalar value, y. The internal structure of the neuron is represented by a weight vector, w, that models the strength of each synapse. For a single multi-dimensional input, the output is obtained as follows:

In this model, we are assuming that each input signal is encoded in the corresponding component of the vector, x; therefore, x_i is processed by the synaptic weight w_i, and so on. In the original version of Hebb's theory, the input vectors represent neural firing rates, which are always non-negative. This means that the synaptic weights can only be strengthened (the neuroscientific term for this phenomenon is long-term potentiation (LTP)). However, for our purposes, we assume that x is a real-valued vector, as is w. This condition allows modeling more artificial scenarios without a loss of generality.

The same operation performed on a single vector holds when it's necessary to process many input samples organized in a matrix. If we have N m-dimensional input vectors, the formula becomes as follows:

The basic form of Hebb's rule in a discrete form can be expressed (for a single input) as follows:

The weight correction is hence a vector that has the same orientation of x and magnitude equal to |x| multiplied by a positive parameter, η, which is called the learning rate and the corresponding output, y (which can have either a positive or a negative sign). The sense of Δw is determined by the sign of y; therefore, under the assumption that x and y are real values, two different scenarios arise from this rule:

If x_i > 0 (< 0) and y > 0 (< 0), w_i is strengthened
If x_i > 0 (< 0) and y < 0 (> 0), w_i is weakened

It's easy to understand this behavior considering two-dimensional vectors:

Therefore, if the initial angle α between w and x is less than 90°, w will have the same orientation of x and viceversa if α is greater than 90°. In the following diagram, there's a schematic representation of this process:

Vectorial analysis of Hebb's rule

It's possible to simulate this behavior using a very simple Python snippet. Let's start with a scenario where α is less than 90° and 50 iterations:

import numpy as np

w = np.array([1.0, 0.2])
x = np.array([0.1, 0.5])
alpha = 0.0

for i in range(50):
    y = np.dot(w, x.T)
    w += x*y
    alpha = np.arccos(np.dot(w, x.T) / (np.linalg.norm(w) * np.linalg.norm(x)))

print(w)
[  8028.48942243  40137.64711215]

print(alpha * 180.0 / np.pi)
0.00131766983584

As expected, the final angle, α, is close to zero and w has the same orientation and sense of x. We can now repeat the experiment with α greater than 90° (we change only the value of w because the procedure is the same):

w = np.array([1.0, -1.0])

...

print(w)
[-16053.97884486 -80275.89422431]

print(alpha * 180.0 / np.pi)
179.999176456

In this case, the final angle, α, is about 180° and, of course, w has the opposite sense with respect to x.

The scientist S. Löwel expressed this concept with the famous sentence:

"Neurons that fire together wire together"

We can re-express this concept (adapting it to a machine learning scenario) by saying that the main assumption of this approach is based on the idea that when pre- and post-synaptic units are coherent (their signals have the same sign), the connection between neurons becomes stronger and stronger. On the other side, if they are discordant, the corresponding synaptic weight is decreased. For the sake of precision, if x is a spiking rate, it should be represented as a real function x(t) as well as y(t). According to the original Hebbian theory, the discrete equation must be replaced by a differential equation:

If x(t) and y(t) have the same fire rate, the synaptic weight is strengthened proportionally to the product of both rates. If instead there's a relatively long delay between the pre-synaptic activity x(t) and the post-synaptic one y(t), the corresponding weight is weakened. This is a more biologically plausible explanation of the relation fire together → wire together.

However, even if the theory has a strong neurophysiological basis, some modifications are necessary. In fact, it's easy to understand that the resulting system is always unstable. If two inputs are repeatedly applied (both real values and firing rates), the norm of the vector, w, grows indefinitely and this isn't a plausible assumption for a biological system. In fact, if we consider a discrete iteration step, we have the following equation:

The previous output, y_k, is always multiplied by a factor greater than 1 (except in the case of null input), therefore it grows without a bound. As y = w · x, this condition implies that the magnitude of w increases (or remains constant if the magnitude of x is null) at each iteration (a more rigorous proof can be easily obtained considering the original differential equation).

Such a situation is not only biologically unacceptable, but it's also necessary to properly manage it in machine learning problems in order to avoid a numerical overflow after a few iterations. In the next paragraph, we're going to discuss some common methods to overcome this issue. For now, we can continue our analysis without introducing a correction factor.

Let's now consider a dataset, X:

We can apply the rule iteratively to all elements, but it's easier (and more useful) to average the weight modifications over the input samples (the index now refers to the whole specific vector, not to the single components):

In the previous formula, C is the input correlation matrix:

For our purposes, however, it's useful to consider a slightly different Hebbian rule based on a threshold θ for the input vector (there's also a biological reason that justifies this choice, but it's beyond the scope of this book; the reader who is interested can find it in Theoretical Neuroscience, Dayan P., Abbott F. L., The MIT Press).

It's easy to understand that in the original theory where x(t) and y(t) are firing rates, this modification allows a phenomenon opposite to LTP and called long-term depression (LTD). In fact, when x(t) < θ and y(t) is positive, the product (x(t) - θ)y(t) is negative and the synaptic weight is weakened.

If we set θ = 〈x〉 ≈ E[X], we can derive an expression very similar to the previous one, but based on the input covariance matrix (unbiased through the Bessel's correction):

For obvious reasons, this variant of the original Hebb's rule is called the covariance rule.

It's also possible to use the Maximum Likelihood Estimation (MLE) (or biased) covariance matrix (dividing by N), but it's important to check which version is adopted by the mathematical package that is employed. When using NumPy, it's possible to decide the version using the np.cov() function and setting the bias=True/False parameter (the default value is False). However, when N >> 1, the difference between versions decreases and can often be discarded. In this book, we'll use the unbiased version. The reader who wants to see further details about the Bessel's correction can read Applied Statistics, Warner R., SAGE Publications.

Table of Contents for Hebb's rule

Create new playlist

Sign In

Sign Up

Table of Contents for
Hebb's rule