Training the VAE

In order for us to train the VAE and use the KL divergence loss, we will first need to play around with how we generate the latent vectors. Rather than having the encoder produce a latent vector exactly directly, we will make the encoder produce two vectors. The first will be a vector μ of mean values, and the second will be a vector σ of standard deviation values. From these, we can create a third vector, where the ith element is sampled from a Gaussian distribution using the ith values of our μ and σ vectors as the mean and standard deviation for that Gaussian distribution. This third sampled vector is then sent to the decoder.

Our model now looks something like this:

The mean and standard deviation blocks in the preceding diagram will be just normal fully connected layers that will learn how to return values we want by being guided by the KL loss function. The reason for changing how we get our latent vectors is because it allows us to calculate the KL divergence loss easily. The KL loss will now be the following, where latent_mean is μ and latent_stddev is σ:

0.5 * tf.reduce_sum(tf.square(latent_mean) + tf.square(latent_stddev) - tf.log(tf.square(latent_stddev)) - 1, 1)

Unfortunately, there is this Sample block, which you can consider as a random generator node, that is not differentiable. This means we can't use backpropagation to train our VAE. What we need is something called the Reparameterization trick that will take out the sampling from the backpropagation flow.

Table of Contents for Training the VAE

Create new playlist

Sign In

Sign Up

Table of Contents for
Training the VAE