Example of Rubner-Tavan's network

For our Python example, we are going to use the same dataset already created for the Sanger's network (which is expected to be available in the variable Xs). Therefore, we can start setting up all the constants and variables:

import numpy as np

n_components = 2
learning_rate = 0.0001
max_iterations = 1000
stabilization_cycles = 5
threshold = 0.00001

W = np.random.normal(0.0, 0.5, size=(Xs.shape[1], n_components))
V = np.tril(np.random.normal(0.0, 0.01, size=(n_components, n_components)))
np.fill_diagonal(V, 0.0)

prev_W = np.zeros((Xs.shape[1], n_components))
t = 0

At this point, it's possible to implement the training loop:

while(np.linalg.norm(W - prev_W, ord='fro') > threshold and t < max_iterations):
prev_W = W.copy()
t += 1

for i in range(Xs.shape[0]):
y_p = np.zeros((n_components, 1))
xi = np.expand_dims(Xs[i], 1)
y = None

for _ in range(stabilization_cycles):
y = np.dot(W.T, xi) + np.dot(V, y_p)
y_p = y.copy()

dW = np.zeros((Xs.shape[1], n_components))
dV = np.zeros((n_components, n_components))

for t in range(n_components):
y2 = np.power(y[t], 2)
dW[:, t] = np.squeeze((y[t] * xi) + (y2 * np.expand_dims(W[:, t], 1)))
dV[t, :] = -np.squeeze((y[t] * y) + (y2 * np.expand_dims(V[t, :], 1)))

W += (learning_rate * dW)
V += (learning_rate * dV)

V = np.tril(V)
np.fill_diagonal(V, 0.0)

W /= np.linalg.norm(W, axis=0).reshape((1, n_components))

The final W and the output covariance matrix are as follows:

print(W)
[[-0.65992841 0.75897537] [-0.75132849 -0.65111933]]

Y_comp = np.zeros((Xs.shape[0], n_components))

for i in range(Xs.shape[0]):
y_p = np.zeros((n_components, 1))
xi = np.expand_dims(Xs[i], 1)

for _ in range(stabilization_cycles):
Y_comp[i] = np.squeeze(np.dot(W.T, xi) + np.dot(V.T, y_p))
y_p = y.copy()

print(np.cov(Y_comp.T))
[[ 48.9901765 -0.34109965] [ -0.34109965 24.51072811]]

As expected, the algorithm has successfully converged to the eigenvectors (in descending order) and the output covariance matrix is almost completely decorrelated (the sign of the non-diagonal elements can be either positive or negative). Rubner-Tavan's networks are generally faster than Sanger's network, thanks to the feedback signal created by the anti-Hebbian rule; however, it's important to choose the right value for the learning rate. A possible strategy is to implement a temporal decay (as done in Sanger's network) starting with a value not greater than 0.0001. However, it's important to reduce η when n increases (for example, η = 0.0001 / n), because the normalization strength of Oja's rule on the lateral connections vjk is often not enough to avoid over and underflows when n >> 1. I don't suggest any extra normalization on V (which must be carefully analyzed considering that V is singular) because it can slow down the process and reduce the final accuracy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.226.120