Most of the networks we have built across previous chapters have utilized the sequential API from Keras. Fusion layer is an innovative way to utilize transfer learning in this context. Remember that we have utilized the input grayscale image as input for two different networks, an encoder and a pretrained VGG16. Since the output of both networks is of different shapes, we repeat the output from VGG16 by 1,000 times and concatenate or perform a fusion with the encoder output. The following snippet prepares the fusion layer:
#Fusion
fusion_layer_output = RepeatVector(32*32)(emd_input)
fusion_layer_output = Reshape(([32,32,
1000]))(fusion_layer_output)
fusion_layer_output = concatenate([enc_output,
fusion_layer_output], axis=3)
fusion_layer_output = Conv2D(DIM, (1, 1),
activation='relu',
padding='same')(fusion_layer_output)
The repetition of output from VGG16 is attached along the depth axis of the encoder output. This ensures the feature embedding of the image, as extracted from VGG16, is evenly spread across the whole image:
Source: Baldassarre et al
The input to the feature extractor or the pretrained VGG16 and the structure of the fusion layer are shown in the preceding image.