Generative Adversarial Networks (GANs) are, in their most basic form, two neural networks that teach each other how to solve a specific task. The idea was invented by Goodfellow and colleagues in 2014.1 The two networks help each other with the final goal of being able to generate new data that looks like the data used for training. For example, you may want to train a network to generate human faces that are as realistic as possible. In this case, one network will generate human faces as good as it can, and the second network will criticize the results and tell the first network how to improve upon the faces. The two networks learn from each other, so to speak. This chapter looks in detail at how this works and explains how to implement an easy example in Keras.
The goal of this chapter is to give you a basic understanding of how GANs work. Adversarial learning (of which GANs are a specific case) is a vast area of research and starts to be an advanced topic in deep learning. This chapter investigates in detail how a basic GAN system works. We discuss, albeit in a shorter way, how conditional GANs function. Complete examples can be found, as usual, at https://adl.toelt.ai.
Introduction to GANs
The best way for you to understand how GANs work is to base this discussion on the diagram in Figure 11-1. After you understand what is going on under the hood, we will look at how to implement GANs in Keras.
Training Algorithm for GANs
To build a GANs system, we need two neural networks: a generator and a discriminator . The generator has the goal of producing a fake observation2 Xfake, while the discriminator has the goal of classifying an input X as fake or real.
The generator gets as input a noise vector ξ ∈ ℝk taken from a normal distribution. The size of this vector is not fixed and can be chosen based on the problem at hand. In the example we discuss in this chapter, we use k = 100. The generator (George) takes the random vector and generates a fake observation Xfake (as you can see in Figure 11-1). The output Xfake will have the same dimension of the observations contained in the training dataset Xreal (in this example, van Gogh paintings). If, for example, if Xreal are 1000x1000 pixels images in color, then Xfake will also be a 1000x1000 color image.
Now it’s the discriminator’s (Anna’s) turn. It gets as input a Xreal (or Xfake) and produces a one-dimensional output (the probability of the input of being real or fake). Basically, the discriminator is performing binary classification.
- 1.
A vector ξ ∈ ℝk of k numbers is generated from a normal distribution.
- 2.
Using this ξ the generator gives as output an Xfake.
- 3.
The discriminator is used two times: one with real input (Xreal) and one with the Xfake generated in the previous step.
- 4.
Two loss functions are calculated: LG = CE(Yfake, 1) and LD = CE(Yreal, 1) + CE(Xfake, 0).
- 5.
Via an optimizer (Adam, Momentum, etc.), the two loss functions are minimized sequentially (sometimes there is one step for the generator, and multiple steps in updating the weights for the discriminator). Note that minimizing LG will be done only with respect to the trainable parameters of the generator, while minimizing LD will be done only with respect to the trainable parameters of the discriminator.
A Practical Example with Keras and MNIST
That is a rather small network. The input will be an image that’s 28x28 pixels in resolution with just one channel (gray levels). The output is the probability that the image is real and is achieved with one neuron called layers.Dense(1).
We generate Xfake: generated_images = generator(noise, training=True)
Then : real_output = discriminator(images, training=True)
Then : fake_output = discriminator(generated_images, training=True)
Then we define LG: gen_loss = generator_loss(fake_output)
Then we define LD: disc_loss = discriminator_loss(real_output, fake_output)
Be aware that due to the dimensions used in the code, if you want to extract the 28x28 image you need to use the code generated_image[0, :, :, 0]. You can find the entire code at https://adl.toelt.ai. Try different networks, different numbers of epochs, and so on to get a feeling for how such an approach can generate realistic images from a training dataset.
Note that this approach learns from all the classes at the same time. For example, it’s not possible to ask the network to generate a specific digit. The generator will simply randomly generate one digit. To be able to do this, we need to implement what is called “conditional” GANs. Those GANs also get as input the class labels and can generate examples from specific classes. If you want to try the code on your laptop, keep in mind that training GANs is rather slow. If you do it on Google Colab and you use a GPU, one epoch may take up to 30 seconds or more. Keep that in mind. On a modern laptop without GPUs, one epoch may take up to 1.5-2 minutes.
A Note on Training
There is an important aspect on why the training is implemented in sequential fashion that we need to discuss. You might wonder why we need to train the two networks in an alternate fashion. Why can’t we train the discriminator alone for example until it gets really good at distinguishing fakes and real images? The reason is very simple. Imagine that the discriminator is really good. It will always spot the Xfake as fakes, and therefore the generator will never be able to get better, because the discriminator will never make a mistake. Therefore, training in such a situation would never be successful. In practice one of the biggest challenges when training GANs is to make sure that the generator and the discriminator networks remain at approximately the same skill level during the training. This has been shown to be the sweet spot for the training to be efficient and successful.
Conditional GANs
Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).
Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets.” arXiv preprint arXiv:1411.1784 (2014).
Conclusion
The goal of this chapter was not to go into details about advanced GAN architectures, but to give you an initial understanding of how adversarial learning works. There are many advanced architectures and topics about GANs that we cannot cover in this book, since that would go well beyond the skill level of the average reader. But with this chapter, I hope I have given you an initial understanding of how GANs work and how easy it is to implement them in Keras.