© Umberto Michelucci 2022
U. MichelucciApplied Deep Learning with TensorFlow 2https://doi.org/10.1007/978-1-4842-8020-1_11

11. Generative Adversarial Networks (GANs)

Umberto Michelucci1  
(1)
Dübendorf, Switzerland
 

Generative Adversarial Networks (GANs) are, in their most basic form, two neural networks that teach each other how to solve a specific task. The idea was invented by Goodfellow and colleagues in 2014.1 The two networks help each other with the final goal of being able to generate new data that looks like the data used for training. For example, you may want to train a network to generate human faces that are as realistic as possible. In this case, one network will generate human faces as good as it can, and the second network will criticize the results and tell the first network how to improve upon the faces. The two networks learn from each other, so to speak. This chapter looks in detail at how this works and explains how to implement an easy example in Keras.

The goal of this chapter is to give you a basic understanding of how GANs work. Adversarial learning (of which GANs are a specific case) is a vast area of research and starts to be an advanced topic in deep learning. This chapter investigates in detail how a basic GAN system works. We discuss, albeit in a shorter way, how conditional GANs function. Complete examples can be found, as usual, at https://adl.toelt.ai.

Introduction to GANs

The best way for you to understand how GANs work is to base this discussion on the diagram in Figure 11-1. After you understand what is going on under the hood, we will look at how to implement GANs in Keras.

Training Algorithm for GANs

To build a GANs system, we need two neural networks: a generator and a discriminator . The generator has the goal of producing a fake observation2 Xfake, while the discriminator has the goal of classifying an input X as fake or real.

Imagine the following classical example: the generator (let’s call him George) can be an art forger who is trying to produce paintings of some known painter, let’s say Vincent van Gogh. And the discriminator (let’s call her Anna) is an art critic who scrutinizes the paintings that George has produced to determine if they are genuine or not. They are new to this, so they decide to learn this process together3. George produces a painting. Anna examines it and gives some suggestions to George. Every now and then, Anna also trains with some real van Gogh paintings to get better at spotting errors in George’s work. This process is repeated many times, until George is so good he can fool Anna. At this point, George can paint like van Gogh, can produce many fakes, and can get rich by selling his fake paintings4. This process is depicted in Figure 11-1. Let’s see how this story translates into the language of neural networks.
Figure 11-1

All the components and steps of a GAN setup

The generator gets as input a noise vector ξ ∈ k taken from a normal distribution. The size of this vector is not fixed and can be chosen based on the problem at hand. In the example we discuss in this chapter, we use k = 100. The generator (George) takes the random vector and generates a fake observation Xfake (as you can see in Figure 11-1). The output Xfake will have the same dimension of the observations contained in the training dataset Xreal (in this example, van Gogh paintings). If, for example, if Xreal are 1000x1000 pixels images in color, then Xfake will also be a 1000x1000 color image.

Now it’s the discriminator’s (Anna’s) turn. It gets as input a Xreal (or Xfake) and produces a one-dimensional output $$ hat{Y} $$ (the probability of the input of being real or fake). Basically, the discriminator is performing binary classification.

The steps of the training loop are described here.
  1. 1.

    A vector ξ ∈ k of k numbers is generated from a normal distribution.

     
  2. 2.

    Using this ξ the generator gives as output an Xfake.

     
  3. 3.

    The discriminator is used two times: one with real input (Xreal) and one with the Xfake generated in the previous step.

     
  4. 4.

    Two loss functions are calculated: LG = CE(Yfake, 1) and LD = CE(Yreal, 1) + CE(Xfake, 0).

     
  5. 5.

    Via an optimizer (Adam, Momentum, etc.), the two loss functions are minimized sequentially (sometimes there is one step for the generator, and multiple steps in updating the weights for the discriminator). Note that minimizing LG will be done only with respect to the trainable parameters of the generator, while minimizing LD will be done only with respect to the trainable parameters of the discriminator.

     

A Practical Example with Keras and MNIST

This section shows a practical example of what we discussed in the previous section implemented with Keras and applied to the MNIST dataset.5 As usual you can find the complete code at https://adl.toelt.ai, so we will concentrate here only on the relevant parts of the code. In particular, we look at the five steps described in the previous section and see how to implement them. To start, we need to create two neural networks: the generator and the discriminator. This can be done in the usual way. Nothing new here. For example
def make_generator_model():
    model = tf.keras.Sequential()
    model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(layers.Reshape((7, 7, 256)))
    assert model.output_shape == (None, 7, 7, 256)  # Note: None is the batch size
    model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
    assert model.output_shape == (None, 7, 7, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    assert model.output_shape == (None, 14, 14, 64)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
    return model
The important part of this network is the input shape: input_shape=(100,). Remember that the generator gets as input the random vector ξ that is, in our example, a 100-dimensional vector of random numbers generated from a normal distribution. Figure 11-2 shows a better visualization of the network.
Figure 11-2

The generator neural network architecture

Figure 11-2 shows how the random vector is transformed in increasingly larger images, until at the end, the expected 28x28-pixel image with one channel is obtained (this will be the Xfake we discussed in the previous section). The discriminator can be created analogously, with standard Keras:
def make_discriminator_model():
    model = tf.keras.Sequential()
    model.add(layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same',
                                     input_shape=[28, 28, 1]))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))
    model.add(layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))
    model.add(layers.Flatten())
    model.add(layers.Dense(1))
    return model

That is a rather small network. The input will be an image that’s 28x28 pixels in resolution with just one channel (gray levels). The output is the probability that the image is real and is achieved with one neuron called layers.Dense(1).

Figure 11-3 shows the network architecture.
Figure 11-3

The discriminator neural network architecture

As discussed, we need to train the two networks in alternate fashion, so you will realize that the standard compile()/fit() approach will not be enough and you will need to develop a custom training loop.6 Before doing that, we need to define the loss functions. This is not difficult, and we can start with the discriminator function LD:
def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss
After having defined
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
You will remember that we need the Xfake (this will be the fake_output variable) and the Xreal (the real_output variable) to train the discriminator. The generator loss function LG is defined analogously
def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)
For LG, as you will remember from the previous section, we only need Xfake. At this point, we are almost done. We need to define the optimizers (always using standard Keras functions):
generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)
And now here is the custom training loop
def train_step(images):
    # Generation of the xi vector (random noise)
    noise = tf.random.normal([BATCH_SIZE, noise_dim])
    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
      # Calculation of X_{fake}
      generated_images = generator(noise, training=True)
      # Calculation of hat Y_{real}
      real_output = discriminator(images, training=True)
      # Calculation of hat Y_{fake}
      fake_output = discriminator(generated_images, training=True)
      # Calculation of L_G
      gen_loss = generator_loss(fake_output)
      # Calculation of L_D
      disc_loss = discriminator_loss(real_output, fake_output)
    # Calculation of the gradients of L_G for backpropagation
    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    # Calculation of the gradients of L_D for backpropagation
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
# Applications of the gradients to update the weights generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
Let’s summarize the steps:
  • We generate Xfake: generated_images = generator(noise, training=True)

  • Then $$ {hat{Y}}_{real} $$: real_output = discriminator(images, training=True)

  • Then $$ {hat{Y}}_{fake} $$: fake_output = discriminator(generated_images, training=True)

  • Then we define LG: gen_loss = generator_loss(fake_output)

  • Then we define LD: disc_loss = discriminator_loss(real_output, fake_output)

At this point we can evaluate the gradients:
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
and then apply them to update the trainable parameters of the two networks:
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
At this point the only thing left is to perform those steps enough times to get the network to learn. By comparing Figure 11-1 with this code, you should be able to immediately see how this GAN is implemented. Figure 11-4 shows examples of digits generated by the generator network. The digits do not exist in the dataset and have been “created” by the neural network.
Figure 11-4

Four examples of digits generated by the generator network. The digits do not exist in the dataset and have been “created” by the neural network

The only thing you need to do in order to generate images is feed the generator with 100 random numbers. For example, with
noise = tf.random.normal([1, 100])
generated_image = generator(noise, training=False)

Be aware that due to the dimensions used in the code, if you want to extract the 28x28 image you need to use the code generated_image[0, :, :, 0]. You can find the entire code at https://adl.toelt.ai. Try different networks, different numbers of epochs, and so on to get a feeling for how such an approach can generate realistic images from a training dataset.

Note that this approach learns from all the classes at the same time. For example, it’s not possible to ask the network to generate a specific digit. The generator will simply randomly generate one digit. To be able to do this, we need to implement what is called “conditional” GANs. Those GANs also get as input the class labels and can generate examples from specific classes. If you want to try the code on your laptop, keep in mind that training GANs is rather slow. If you do it on Google Colab and you use a GPU, one epoch may take up to 30 seconds or more. Keep that in mind. On a modern laptop without GPUs, one epoch may take up to 1.5-2 minutes.

A Note on Training

There is an important aspect on why the training is implemented in sequential fashion that we need to discuss. You might wonder why we need to train the two networks in an alternate fashion. Why can’t we train the discriminator alone for example until it gets really good at distinguishing fakes and real images? The reason is very simple. Imagine that the discriminator is really good. It will always spot the Xfake as fakes, and therefore the generator will never be able to get better, because the discriminator will never make a mistake. Therefore, training in such a situation would never be successful. In practice one of the biggest challenges when training GANs is to make sure that the generator and the discriminator networks remain at approximately the same skill level during the training. This has been shown to be the sweet spot for the training to be efficient and successful.

Conditional GANs

Now let´s turn our attention to conditional GANs (CGANs) . CGANs work the same as described in this chapter. The working idea is the same, with the difference that we will be able to specify from which class we want the generator to create an image. In the MNIST example, we could tell the generator that we want one fake digit one, for example. Figure 11-5 shows an updated diagram explaining the training (it’s Figure 11-1 updated).
Figure 11-5

The training of a CGAN system. the red highlights the role of the label that makes it possible for the generator to create fake examples of specific classes

The main things that we need to change to achieve this are the architectures of the two networks. Figures 11-6 and 11-7 show example architectures of the two networks: a generator and a discriminator, respectively.
Figure 11-6

The generator network architecture for a CGAN

Figure 11-7

The discriminator network architecture for a CGAN

From Figures 11-6 and 11-7, you can immediately see that they now have an additional input: a one-dimensional tensor, which will be the class. The kinds of networks that you see here can be easily implemented using the functional Keras API. Just to give you an idea about how to build such networks, here are the first layers of the generator network, up until the merging of the two branches
input_label = Input(shape=(1,))
emb = Embedding(n_classes, 50)(input_label)
n_nodes = 7 * 7
emb = Dense(n_nodes)( emb)
emb = Reshape((7, 7, 1))(emb)
in_lat = Input(shape=(latent_dim,))
n_nodes = 128 * 7 * 7
gen = Dense(n_nodes)(in_lat)
gen = LeakyReLU(alpha=0.2)(gen)
gen = Reshape((7, 7, 128))(gen)
merge = Concatenate()([gen, emb])
You can see how flexible Keras Functional APIs are. Now, when training the generator network for example, you need to give as input not only a random vector ξ as before, but also a random vector and a class label, which you will use to choose the Xreal to train the discriminator. To generate the random noise and the labels, you use this code
latend_dim = 100
x_input = randn(latent_dim * n_samples)
z_input = x_input.reshape(n_samples, latent_dim)
labels = randint(0, n_classes, n_samples)
And you will give the generator network the input as [z_input, labels]. The n_samples variable is simply the batch size you want to use. Analyzing the complete code would make this chapter really long, really difficult to follow, and really boring. Implementing a CGAN starts to be really advanced, and the best way of understanding it is to go through a complete example. As usual, you will find one at https://adl.toelt.ai where you can check all the code. In the meanwhile, your best source of knowledge about CGANs are the two papers that you should study to understand what is going on with CGANs.
  • Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).

  • Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets.” arXiv preprint arXiv:1411.1784 (2014).

Conclusion

The goal of this chapter was not to go into details about advanced GAN architectures, but to give you an initial understanding of how adversarial learning works. There are many advanced architectures and topics about GANs that we cannot cover in this book, since that would go well beyond the skill level of the average reader. But with this chapter, I hope I have given you an initial understanding of how GANs work and how easy it is to implement them in Keras.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.26.90