The generator network is an auto-encoder type of network. It takes an image as an input and outputs another image. It has two parts: an encoder and a decoder. The encoder contains convolutional layers with downsampling capabilities and transforms an input of a shape of 128x128x3 to an internal representation. The decoder contains two upsampling blocks and a final convolution layer, which transforms the internal representation to an output of a shape of 128x128x3.
The generator network contains the following blocks:
- The convolution block
- The residual block
- The upsampling block
- The final convolution layer
Let's go through each component one by one:
- The convolution block: The convolution block contains a 2D convolution layer, followed by an instance normalization layer and relu as the activation function. Refer to Chapter 1, Introduction to Generative Adversarial Networks, to learn more about Instance Normalization.
The generator network contains three convolution blocks, the configuration of which is as follows:
Layer name |
Hyperparameters |
Input shape |
Output shape |
2D convolution layer |
filters=32, kernel_size=7, strides=1, padding='same' |
(128, 128, 3) |
(128, 128, 32) |
Instance normalization layer |
axis=1 |
(128, 128, 32) |
(128, 128, 32) |
Activation layer |
activation='relu' |
(128, 128, 32) |
(128, 128, 32) |
2D convolution layer |
filters=64, kernel_size=3, strides=2, padding='same' |
(128, 128, 32) |
(64, 64, 64) |
Instance normalization layer |
axis=1 |
(64, 64, 64) |
(64, 64, 64) |
Activation layer |
activation='relu' |
(64, 64, 64) |
(64, 64, 64) |
2D convolution layer |
filters=128, kernel_size=3, strides=2, padding='same' |
(64, 64, 64) |
(32, 32, 128) |
Instance normalization layer |
axis=1 |
(32, 32, 128) |
(32, 32, 128) |
Activation layer |
activation='relu' |
(32, 32, 128) |
(32, 32, 128) |
- The residual block: The residual block contains two 2D convolution layers. Both layers are followed by a batch normalization layer with a value of momentum equal to 0.8. The generator network contains six residual blocks, the configuration of which is as follows:
Layer name |
Hyperparameters |
Input shape |
Output shape |
2D convolution layer |
filters=128, kernel_size=3, strides=1, padding='same' |
(32, 32, 128) |
(32, 32, 128) |
Batch normalization Layer |
axis=3, momentum=0.9, epsilon=1e-5 |
(32, 32, 128) |
(32, 32, 128) |
2D convolution layer |
filters=138, kernel_size=3, strides=1, padding='same' |
(32, 32, 128) |
((32, 32, 128) |
Batch normalization layer |
axis=3, momentum=0.9, epsilon=1e-5 |
(32, 32, 128) |
(32, 32, 128) |
Addition layer |
None |
(32, 32, 128) |
(32, 32, 128) |
The addition layer calculates the sum of the input tensor to the block and the output of the last batch normalization layer.
- The upsampling block: The upsampling block contains a transpose 2D convolution layer and uses relu as the activation function. There are two upsampling blocks in the generator network. The configuration of the first upsampling block is as follows:
Layer name |
Hyperparameters |
Input shape |
Output shape |
Transpose 2D convolution layer |
filters=64, kernel_size=3, strides=2, padding='same', use_bias=False |
(32, 32, 128) |
(64, 64, 64) |
Instance normalization layer |
axis=1 |
(64, 64, 64) |
(64, 64, 64) |
Activation layer |
activation='relu' |
(64, 64, 64) |
(64, 64, 64) |
The configuration of the second upsampling block is as follows:
Layer name |
Hyperparameters |
Input shape |
Output shape |
Transpose 2D convolution layer |
filters=32, kernel_size=3, strides=2, padding='same', use_bias=False |
(64, 64, 64) |
(128, 128, 32) |
Instance normalization layer |
axis=1 |
(128, 128, 32) |
(128, 128, 32) |
Activation layer |
activation='relu' |
(128, 128, 32) |
(128, 128, 32) |
- The last convolution layer: The last layer is a 2D convolution layer that uses tanh as the activation function. It generates an image of a shape of (256, 256, 3). The configuration of the last layer is as follows:
Layer name |
Hyperparameters |
Input shape |
Output shape |
2D convolution layer |
filters=3, kernel_size=7, strides=1, padding='same', activation='tanh' |
(128, 128, 32) |
(128, 128, 3) |