The generator in this paper is based on a popular encoder-decoder-style network called U-Net. I encourage you to read the original U-Net network paper to understand the architecture. Here's a diagram included with the Image-to-Image Translation with Conditional Adversarial Networks paper:
U-Net features an Encoder-decoder structure with skip connections that link both sides of the architecture
Here are some pseudocode steps to implement this network:
- Create a method that will define this model:
define a model:
- Define the input tensor to be the shape of the image:
model = input(image.shape)
- Build the encode architecture—the details of each layer will be addressed in a later recipe. For now, the convolutional two-dimensional layers are defined as big, medium, and small, depending on the size of their filters and resulting tensors:
# Encoder Architecture
conv2d(big)
conv2d(medium)
conv2d(small)
conv2d(small)
conv2d(small)
conv2d(small)
- For the decoder, we're taking the reduced image and bringing it back up to full resolution using convolutional two-dimensional layers—notice the skill-connection layers built here:
# Decoder Architecture
conv2d(small)
skip_connection()
conv2d(medium)
skip_connection()
conv2d(medium)
skip_connection()
conv2d(big)
skip_connection()
conv2d(big)
skip_connection()
conv2d(big)
skip_connection()
conv2d(big)
In practice with Keras, a skip connection manifests itself as a concatenate layer within the network.
- Create an output layer and return the model:
output(image.shape)
return model