In this example, we are going to add an L1 regularization term to the cost function that was defined in the first exercise:
import tensorflow as tf
...
# Loss
sparsity_constraint = tf.reduce_sum(0.001 * tf.norm(code_layer, ord=1, axis=1))
loss = tf.nn.l2_loss(convt_3 - r_input_images) + sparsity_constraint
...
The training process is exactly the same, and therefore we can directly show the final code mean after 200 epochs:
import numpy as np
codes = session.run([code_layer],
feed_dict={
input_images: np.expand_dims(X_train, axis=3),
})[0]
print(np.mean(codes))
0.45797634
As you can see, the mean is now lower, indicating that more code values are close to 0. I invite the reader to implement the other strategy, considering that it's easier to create a constant vector filled with small values (for example, 0.01) and exploit the vectorization properties offered by TensorFlow. I also suggest simplifying the Kullback–Leibler divergence by splitting it into an entropy term H(pr) (which is constant) and a cross-entropy H(z, pr) term.