Fusion layer

Most of the networks we have built across previous chapters have utilized the sequential API from Keras. Fusion layer is an innovative way to utilize transfer learning in this context. Remember that we have utilized the input grayscale image as input for two different networks, an encoder and a pretrained VGG16. Since the output of both networks is of different shapes, we repeat the output from VGG16 by 1,000 times and concatenate or perform a fusion with the encoder output. The following snippet prepares the fusion layer:

#Fusion
fusion_layer_output = RepeatVector(32*32)(emd_input)
fusion_layer_output = Reshape(([32,32,
                          1000]))(fusion_layer_output)
fusion_layer_output = concatenate([enc_output,
                                   fusion_layer_output], axis=3)
fusion_layer_output = Conv2D(DIM, (1, 1),
                       activation='relu',
                       padding='same')(fusion_layer_output)

The repetition of output from VGG16 is attached along the depth axis of the encoder output. This ensures the feature embedding of the image, as extracted from VGG16, is evenly spread across the whole image:

Concatenated grayscale input the pretrained network (left), Fusion layer (right)
Source: Baldassarre et al

The input to the feature extractor or the pretrained VGG16 and the structure of the fusion layer are shown in the preceding image.

Table of Contents for Fusion layer

Create new playlist

Sign In

Sign Up

Table of Contents for
Fusion layer