© Poornachandra Sarang 2021
P. SarangArtificial Neural Networks with TensorFlow 2https://doi.org/10.1007/978-1-4842-6150-7_14

14. Image Translation

Poornachandra Sarang1  
(1)
Mumbai, India
 

Have you ever thought of colorizing the old B&W photograph of your granny? You would probably approach a Photoshop artist to do this job for you, paying them hefty fees and awaiting a couple of days/weeks for them to finish the job. If I tell you that you could do this with a deep neural network, would you not get excited about learning how to do it? Well, this chapter teaches you the technique of converting your B&W images to a colorized image almost instantaneously. The technique is simple and uses a network architecture known as AutoEncoders. So, let us first look at AutoEncoders.

AutoEncoders

AutoEncoders consist of two parts – an Encoder and a Decoder. It may be schematically represented as shown in Figure 14-1.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig1_HTML.jpg
Figure 14-1

AutoEncoder architecture

On the left, we have a B&W image fed to our network. On the right, where we have the network output, we have a colorized image of the same input content. What goes in between can be described like this. The Encoder processes the image through a series of Convolutional layers and downsizes the image to learn the reduced dimensional representation of the input image. The decoder then attempts to regenerate the image by passing it through another series of Convolutional layers, upsizing and adding colors in the process.

Now, to understand how to colorize an image, you must first understand the color spaces.

Color Spaces

A color image consists of the colors in a given color space and the luminous intensity. A range of colors are created by using the primary colors such as red, green, blue. This entire range of colors is then called a color space, for example, RGB. In mathematical terms, a color space is an abstract mathematical model that simply describes the range of colors as tuples of numbers. Each color is represented by a single dot.

I will describe the three most popular color spaces:
  • RGB

  • YCbCr

  • Lab

RGB is the most commonly used color space. It contains three channels – red (R), green (G), and blue (B). Each channel is represented by 8 bits and can take a maximum value of 256. Combined together, they can represent over 16 million colors.

The JPEG and MPEG formats use the YCbCr color space. It is more efficient for digital transmission and storage as compared to RGB. The Y channel represents the luminosity of a grayscale image. The Cb and Cr represent the blue and red difference chroma components. The Y channel takes values from 16 through 235. The Cb and Cr values range from 16 to 240. Be careful, the combined value of all these channels may not represent a valid color. In our application of colorization, we don't use this color space.

The Lab color space was designed by the International Commission on Illumination (CIE). A visual representation of this color space is shown in Figure 14-2.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig2_HTML.jpg
Figure 14-2

The Lab color space

The Lab color space is larger than the gamut of computer displays and printers. Then, a bitmap image represented as a Lab requires more data per pixel to obtain the same precision as an RGB or CMYK. Thus, the Lab color space is typically used as an intermediary rather than an end color space.

The L channel represents the luminosity and takes values in the range 0 to 100. The “a” channel codes from green (-) to red (+), and the “b” channel codes from blue (-) to yellow (+). For an 8-bit implementation, both take values in the range -127 to +127. The Lab color space approximates the human vision. The amount of numerical change in these component values corresponds to roughly the same amount of visually perceived change.

We use the Lab color space in our project. By separating the grayscale component which represents the luminosity, the network has to learn only two remaining channels for colorization. This helps in reducing the network size and results in faster convergence.

I will now discuss the different network topologies for our AutoEncoders.

Network Configurations

The AutoEncoder network may be configured in three different ways:
  • Vanilla

  • Merged

  • Merged model using pre-trained network

I will now discuss all three models.

Vanilla Model

The vanilla model has the configuration shown in Figure 14-1, where the Encoder has a series of Convolutional layers with strides for downsizing the image and extracting features. The Decoder too has Convolutional layers which are used for upsizing and colorization. In such autoencoders, the encoders are not deep enough to extract the global features of an image. The global features help us in determining how to colorize certain regions of the image. If we make the encoder network deep, the dimensions of the representation would be too small for the decoder to faithfully reproduce the original image. Thus, we need two paths in an Encoder – one to obtain the global features and the other one to obtain a rich representation of the image. This is what is done in the next two models. You will be constructing a vanilla network for the first project in this chapter .

Merged Model

This model was proposed by Lizuka et al. in their paper “Let there be Color!” (http://iizuka.cs.tsukuba.ac.jp/projects/colorization/data/colorization_sig2016.pdf). The model architecture is shown in Figure 14-3.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig3_HTML.jpg
Figure 14-3

Merged model architecture

An eight-layer Encoder was used to extract the mid-level representations. The output of the sixth layer is forked and fed through another seven-layer network to extract the global features. Another Fusion network then concatenates the two outputs and feeds them to the decoder.

Merged Model Using Pre-trained Network

This was proposed by Baldassarre et al. in their paper “Deep Koalarization: Image Colorization using CNNs and Inception-Resnet-v2” (https://arxiv.org/pdf/1712.03400.pdf). The schematic diagram of the model architecture is shown in Figure 14-4.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig4_HTML.jpg
Figure 14-4

Model using a pre-trained network

The feature extraction is done by a pre-trained ResNet.

In the second project in this chapter, I will show you how to use a pre-trained model for feature extraction, though you will not be constructing as complicated a model as shown in this schematic.

With this introduction to AutoEncoders and their configurations, let us start with some practical implementations of them.

AutoEncoder

In this project, you will be using the vanilla autoencoder.

Open a new Colab notebook and rename it to AutoEncoder – Custom. Add the following imports:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from tqdm import tqdm
from itertools import chain
import skimage
from skimage.io import imread, imshow
from skimage.transform import resize
from skimage.util import crop, pad
from skimage.morphology import label
from skimage.color import rgb2gray, gray2rgb,
                            rgb2lab, lab2rgb
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.models
            import Model, load_model,Sequential
from tensorflow.keras.preprocessing.image
            import ImageDataGenerator
from tensorflow.keras.layers import Input, Dense,
            UpSampling2D, RepeatVector, Reshape
from tensorflow.keras.layers import Dropout, Lambda
from tensorflow.keras.layers import Conv2D,
                                    Conv2DTranspose
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras import backend as K

Loading Data

You will be using the dataset provided on the Kaggle site for this project. The site (www.kaggle.com/thedownhill/art-images-drawings-painting-sculpture-engraving) provides a dataset of about 9000 images containing five types of arts. If you have a Kaggle account, you may download the dataset using your credentials with the following code:
#!pip install -q kaggle
#!mkdir ~/.kaggle
#!touch ~/.kaggle/kaggle.json
#api_token = {"username":"Your UserName",
                                "key":"Your key"}
#import json
#with open('/root/.kaggle/kaggle.json', 'w') as file:
#    json.dump(api_token, file)
#!chmod 600 ~/.kaggle/kaggle.json
#!kaggle datasets download -d thedownhill/art-images-drawings-painting-sculpture-engraving
Optionally, the data is also available on the book’s download site and can be downloaded into your project using wget as in the following code fragment:
!wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1CKs7s_MZMuZFBXDchcL_AgmCxgPBTJXK' -O art-images-drawings-painting-sculpture-engraving.zip
After the data file is downloaded, unzip it to your drive using the unzip utility:
!unzip art-images-drawings-painting-sculpture-engraving.zip
When the file is unzipped, you will have lots of images stored on your drive arranged in a specific folder structure. The images have varied sizes. We will convert all our training images to a fixed size of 256x256. We define a few variables for creating our training dataset as follows:
IMG_WIDTH = 256
IMG_HEIGHT = 256
TRAIN_PATH =
'/content/dataset/dataset_updated/training_set/painting/'
train_ids = next(os.walk(TRAIN_PATH))[2]

The os.walk gets all the filenames present in the folder.

We will first check if there are any bad images (unreadable) in the code and remove those from our dataset, though this step is not truly required for our purpose.
missing_count = 0
for n, id_ in tqdm(enumerate(train_ids),
                        total=len(train_ids)):
    path = TRAIN_PATH + id_+''
    try:
        img = imread(path)
    except:
        missing_count += 1
print(" Total missing: "+ str(missing_count))

When you run this code, you will discover that there are 86 bad images in the set.

Now, we will create the training set taking care to remove the bad images.
X_train = np.zeros((len(train_ids)-missing_count,
          IMG_HEIGHT, IMG_WIDTH, 3), dtype=np.uint8)
missing_images = 0
for n, id_ in tqdm(enumerate(train_ids),
                         total=len(train_ids)):
    path = TRAIN_PATH + id_+''
    try:
        img = imread(path)
        img = resize(img, (IMG_HEIGHT, IMG_WIDTH),
                mode='constant', preserve_range=True)
        X_train[n-missing_images] = img
    except:
        missing_images += 1
X_train = X_train.astype('float32') / 255.
You may now examine how the image looks like using the following statement:
plt.imshow(X_train[5])
The output is shown in Figure 14-5.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig5_HTML.jpg
Figure 14-5

A sample image

Creating Training/Testing Datasets

We will just reserve a few images for testing from the dataset that we have created.
x_train, x_test = train_test_split(X_train,
                                    test_size=20)

The train_test_split method as specified in the test_size parameter reserves 20 images for testing.

Preparing Training Dataset

For training the model, we will convert the images from RGB to Lab format. As said earlier, the L channel is the grayscale. It represents the luminance of the image. The “a” is the color balance between green and red, and the “b” is the color balance between blue and yellow.

First, we will create an instance of ImageDataGenerator from the Keras library to convert the images into an array of pixels and finally combine them into a giant vector.
datagen = ImageDataGenerator(
        shear_range=0.2,
        zoom_range=0.2,
        rotation_range=20,
        horizontal_flip=True)

If each image is skewed, the model will learn better. The shear_range tilts the image to the left or right, and the other parameters zoom, rotation, and horizontal flip have their respective meanings.

Now, we will write a function for creating batches of data for training. The function definition is given as follows:
def create_training_batches(dataset=X_train,
                                  batch_size = 20):
    # iteration for every image
    for batch in datagen.flow(dataset, batch_size=batch_size):
        # convert from rgb to grayscale
        X_batch = rgb2gray(batch)
        # convert rgb to Lab format
        lab_batch = rgb2lab(batch)
        # extract L component
        X_batch = lab_batch[:,:,:,0]
        # reshape
        X_batch = X_batch.reshape(X_batch.shape+(1,))
        # extract a and b features of the image
        Y_batch = lab_batch[:,:,:,1:] / 128
        yield X_batch, Y_batch

The function first converts the given image from RGB to grayscale by calling the rgb2gray method. The image is then converted to Lab format by calling the rgb2lab method. If we take Lab color space, we need to predict only two components as compared to other color spaces where we would need to predict three or four components. As said earlier, this helps in reducing the network size and results in a faster convergence. Finally, we extract the L, a, and b components from the image.

Defining Model

Now, we will define our Autoencoder model. The model configuration is based on the suggestions made in the paper “Let there be Color!” (http://iizuka.cs.tsukuba.ac.jp/projects/colorization/data/colorization_sig2016.pdf).
# the input for the encoder layer
inputs1 = Input(shape=(IMG_WIDTH, IMG_HEIGHT, 1,))
# encoder
# Using Conv2d to reduce the size of feature maps and image size
# convert image to 128x128
encoder_output = Conv2D(64, (3,3), activation="relu",
                  padding='same', strides=2)(inputs1)
encoder_output = Conv2D(128, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
# convert image to 64x64
encoder_output = Conv2D(128, (3,3),
                  activation='relu', padding="same",
                  strides=2)(encoder_output)
encoder_output = Conv2D(256, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
# convert image to 32x32
encoder_output = Conv2D(256, (3,3),
                  activation='relu', padding="same",
                  strides=2)(encoder_output)
encoder_output = Conv2D(512, (3,3),
                  activation='relu', padding="same")
                  (encoder_output)
# mid-level feature extractions
encoder_output = Conv2D(512, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
encoder_output = Conv2D(256, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
# decoder
# Adding colors to the grayscale image and upsizing it
decoder_output = Conv2D(128, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
# image size 64x64
decoder_output = Conv2D(64, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = Conv2D(64, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
# image size 128x128
decoder_output = Conv2D(32, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation="tanh",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
# image size 256x256

Both the Encoder and Decoder contain few Conv2D layers. The Encoder through a series of layers downsamples the image to extract its features, and the Decoder through its own set of layers attempts to regenerate the original image using upsampling at various points and adding colors to the grayscale image to create a final image of size 256x256. The last decoder layer uses tanh activation for squashing the values between –1 and +1. Remember that we had earlier normalized the a and b values in the range –1 through +1.

After the Encoder and Decoder layers are defined, construct the model and compile it using its compile method. We use mse for the loss function and Adam optimizer.
model = Model(inputs=inputs1, outputs=decoder_output)
model.compile(loss='mse', optimizer="adam",
                          metrics=['accuracy'])
print(model.summary())
The model summary is shown in Figure 14-6.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig6_HTML.jpg
Figure 14-6

Autoencoder model summary

You can get the visualization by plotting the model:
tf.keras.utils.plot_model(model)
The output is shown in Figure 14-7.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig7_HTML.jpg
Figure 14-7

Model plot

Model Training

We train the model by calling its fit method.
BATCH_SIZE = 20
model.fit_generator(create_training_batches
                        (X_train,BATCH_SIZE),
            epochs= 100,
            verbose=1,
            steps_per_epoch=X_train.shape[0]/BATCH_SIZE)

It took me slightly over a minute per epoch to train the model on a GPU. By using the pre-trained model, this training time came down to about a second per epoch as you would see when you run the second project in this chapter.

Testing

Now, you can check the model performance on the test dataset that we have created earlier. Note that for the test images, we do not skew them as we did for the training. We simply convert the images to Lab format and do the prediction. Here is the code for model predictions on test images.
test_image = rgb2lab(x_test)[:,:,:,0]
test_image = test_image.reshape
                (test_image.shape+(1,))
output = model.predict(test_image)
output = output * 128
# making the output image array
generated_images = np.zeros
                        ((len(output),256, 256, 3))
for i in range(len(output)):
#iterating for the output
    cur = np.zeros((256, 256, 3))
# dummy array
    cur[:,:,0] = test_image[i][:,:,0]
#assigning the gray scale component
    cur[:,:,1:] = output[i]
#assigning the a and b component
#converting from lab to rgb format as plt only work for rgb mode
    generated_images[i] = lab2rgb(cur)
Display the generated images along with the originals using the following code fragment:
plt.figure(figsize=(20, 6))
for i in range(10):
    # grayscale
    plt.subplot(3, 10, i + 1)
    plt.imshow(rgb2gray(x_test)[i].reshape(256, 256))
    plt.gray()
    plt.axis('off')
    # recolorization
    plt.subplot(3, 10, i + 1 +10)
    plt.imshow(generated_images[i].reshape
                                   (256, 256,3))
    plt.axis('off')
    # original
    plt.subplot(3, 10, i + 1 + 20)
    plt.imshow(x_test[i].reshape(256, 256,3))
    plt.axis('off')
 
plt.tight_layout()
plt.show()
The output is shown in Figure 14-8.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig8_HTML.jpg
Figure 14-8

Model inference

The first row is the set of grayscale images created from the original color images given in the third row. The middle row shows the images generated by the model. As you can see, the model is able to generate the images close enough to the original images.

Now, I will show you how to use this model on an unseen image of a different size.

Inference on an Unseen Image

You can test the model’s performance on an unseen image of your choice. A sample image is available on the book’s site, which can be downloaded using wget.
!wget https://raw.githubusercontent.com/Apress/artificial-neural-networks-with-tensorflow-2/main/ch14/mountain.jpg
Display the original image:
img = imread("mountain.jpg")
plt.imshow(img)
The image is shown in Figure 14-9.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig9_HTML.jpg
Figure 14-9

Sample image with different dimensions

Now, run the inference using the following code. Note that we need to change the image size before inputting the image to the network.
img = resize(img, (IMG_HEIGHT, IMG_WIDTH),
            mode='constant', preserve_range=True)
img = img.astype('float32') / 255.
test_image = rgb2lab(img)[:,:,0]
test_image = test_image.reshape
                    ((1,)+test_image.shape+(1,))
output = model.predict(test_image)
output = output * 128
plt.imshow(img)
plt.axis('off')
The generated image is shown in Figure 14-10.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig10_HTML.jpg
Figure 14-10

A colorized image generated by the custom autoencoder model

Full Source

The full source is given in Listing 14-1 for your reference.
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from tqdm import tqdm
from itertools import chain
import skimage
from skimage.io import imread, imshow
from skimage.transform import resize
from skimage.util import crop, pad
from skimage.morphology import label
from skimage.color import rgb2gray, gray2rgb,
                            rgb2lab, lab2rgb
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.models
            import Model, load_model,Sequential
from tensorflow.keras.preprocessing.image
            import ImageDataGenerator
from tensorflow.keras.layers import Input, Dense,
            UpSampling2D, RepeatVector, Reshape
from tensorflow.keras.layers import Dropout, Lambda
from tensorflow.keras.layers import Conv2D,
                                    Conv2DTranspose
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras import backend as K
#!pip install -q kaggle
#!mkdir ~/.kaggle
#!touch ~/.kaggle/kaggle.json
#api_token = {"username":"Your UserName",
                                "key":"Your key"}
#import json
#with open('/root/.kaggle/kaggle.json', 'w') as file:
#    json.dump(api_token, file)
#!chmod 600 ~/.kaggle/kaggle.json
#!kaggle datasets download -d thedownhill/art-images-drawings-painting-sculpture-engraving
!wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1CKs7s_MZMuZFBXDchcL_AgmCxgPBTJXK' -O art-images-drawings-painting-sculpture-engraving.zip
!unzip art-images-drawings-painting-sculpture-engraving.zip
IMG_WIDTH = 256
IMG_HEIGHT = 256
TRAIN_PATH =
'/content/dataset/dataset_updated/training_set/painting/'
train_ids = next(os.walk(TRAIN_PATH))[2]
missing_count = 0
for n, id_ in tqdm(enumerate(train_ids),
                        total=len(train_ids)):
    path = TRAIN_PATH + id_+''
    try:
        img = imread(path)
    except:
        missing_count += 1
print(" Total missing: "+ str(missing_count))
X_train = np.zeros((len(train_ids)-missing_count,
          IMG_HEIGHT, IMG_WIDTH, 3), dtype=np.uint8)
missing_images = 0
for n, id_ in tqdm(enumerate(train_ids),
                         total=len(train_ids)):
    path = TRAIN_PATH + id_+''
    try:
        img = imread(path)
        img = resize(img, (IMG_HEIGHT, IMG_WIDTH),
                mode='constant', preserve_range=True)
        X_train[n-missing_images] = img
    except:
        missing_images += 1
X_train = X_train.astype('float32') / 255.
plt.imshow(X_train[5])
x_train, x_test = train_test_split(X_train,
                                    test_size=20)
datagen = ImageDataGenerator(
        shear_range=0.2,
        zoom_range=0.2,
        rotation_range=20,
        horizontal_flip=True)
def create_training_batches(dataset=X_train,
                                  batch_size = 20):
    # iteration for every image
    for batch in datagen.flow(dataset, batch_size=batch_size):
        # convert from rgb to grayscale
        X_batch = rgb2gray(batch)
        # convert rgb to Lab format
        lab_batch = rgb2lab(batch)
        # extract L component
        X_batch = lab_batch[:,:,:,0]
        # reshape
        X_batch = X_batch.reshape(X_batch.shape+(1,))
        # extract a and b features of the image
        Y_batch = lab_batch[:,:,:,1:] / 128
        yield X_batch, Y_batch
# the input for the encoder layer
inputs1 = Input(shape=(IMG_WIDTH, IMG_HEIGHT, 1,))
# encoder
# Using Conv2d to reduce the size of feature maps and image size
# convert image to 128x128
encoder_output = Conv2D(64, (3,3), activation="relu",
                  padding='same', strides=2)(inputs1)
encoder_output = Conv2D(128, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
# convert image to 64x64
encoder_output = Conv2D(128, (3,3),
                  activation='relu', padding="same",
                  strides=2)(encoder_output)
encoder_output = Conv2D(256, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
# convert image to 32x32
encoder_output = Conv2D(256, (3,3),
                  activation='relu', padding="same",
                  strides=2)(encoder_output)
encoder_output = Conv2D(512, (3,3),
                  activation='relu', padding="same")
                  (encoder_output)
# mid-level feature extractions
encoder_output = Conv2D(512, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
encoder_output = Conv2D(256, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
# decoder
# Adding colors to the grayscale image and upsizing it
decoder_output = Conv2D(128, (3,3),
                  activation='relu',
                  padding='same')(encoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
# image size 64x64
decoder_output = Conv2D(64, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = Conv2D(64, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
# image size 128x128
decoder_output = Conv2D(32, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation="tanh",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
# image size 256x256
# compiling  model
model = Model(inputs=inputs1, outputs=decoder_output)
model.compile(loss='mse', optimizer="adam",
                          metrics=['accuracy'])
print(model.summary())
tf.keras.utils.plot_model(model)
BATCH_SIZE = 20
model.fit_generator(create_training_batches
                        (X_train,BATCH_SIZE),
            epochs= 100,
            verbose=1,
            steps_per_epoch=X_train.shape[0]/BATCH_SIZE)
test_image = rgb2lab(x_test)[:,:,:,0]
test_image = test_image.reshape
                (test_image.shape+(1,))
output = model.predict(test_image)
output = output * 128
# making the output image array
generated_images = np.zeros
                        ((len(output),256, 256, 3))
for i in range(len(output)):
#iterating for the output
    cur = np.zeros((256, 256, 3))
# dummy array
    cur[:,:,0] = test_image[i][:,:,0]
#assigning the gray scale component
    cur[:,:,1:] = output[i]
#assigning the a and b component
#converting from lab to rgb format as plt only work for rgb mode
    generated_images[i] = lab2rgb(cur)
plt.figure(figsize=(20, 6))
for i in range(10):
    # grayscale
    plt.subplot(3, 10, i + 1)
    plt.imshow(rgb2gray(x_test)[i].reshape(256, 256))
    plt.gray()
    plt.axis('off')
    # recolorization
    plt.subplot(3, 10, i + 1 +10)
    plt.imshow(generated_images[i].reshape
                                   (256, 256,3))
    plt.axis('off')
    
    # original
    plt.subplot(3, 10, i + 1 + 20)
    plt.imshow(x_test[i].reshape(256, 256,3))
    plt.axis('off')
plt.tight_layout()
plt.show()
!wget https://raw.githubusercontent.com/Apress/artificial-neural-networks-with-tensorflow-2/main/ch14/mountain.jpg
img = imread("mountain.jpg")
plt.imshow(img)
img = resize(img, (IMG_HEIGHT, IMG_WIDTH),
            mode='constant', preserve_range=True)
img = img.astype('float32') / 255.
test_image = rgb2lab(img)[:,:,0]
test_image = test_image.reshape
                    ((1,)+test_image.shape+(1,))
output = model.predict(test_image)
output = output * 128
plt.imshow(img)
plt.axis('off')
Listing 14-1

AutoEncoder_Custom

Now, I will show you how to use a pre-trained model for features extraction, thereby saving you a lot of training time and giving better feature extraction.

Pre-trained Model as Encoder

There are several pre-trained models available for image processing. You have used one such VGG16 model in Chapter 12. The use of this model allows you to extract the image features, and that is what we did in our previous program by creating our own encoder. So why not use the transfer learning by using a VGG16 pre-trained model in place of an encoder? And that is what I am going to demonstrate in this application. The use of a pre-trained model would certainly provide better results as compared to your own defined encoder and a faster training too.

Project Description

You will be using the same image dataset as in the previous project. Thus, the data loading and preprocessing code would remain the same. What changes is the model definition and the inference. So, I will describe only the relevant changes. The entire project source is available in the book’s download site and also given at the end of this section for your quick reference. The project is named AutoEncoder-TransferLearning.

As the VGG16 was trained on images of size 224x224, you will need to change those two constant values to the following:
IMG_WIDTH = 224
IMG_HEIGHT = 224

Defining Model

You have already seen the VGG16 architecture in Chapter 12 (Figure 12-5). The first 18 layers of the VGG model extract the image features. So, we will use these layers and discard all subsequent layers. We create a new sequential model using the following code snippet:
vggmodel = tf.keras.applications.vgg16.VGG16()
newmodel = Sequential()
num = 0
for i, layer in enumerate(vggmodel.layers):
    if i<19:
      newmodel.add(layer)
newmodel.summary()
for layer in newmodel.layers:
  layer.trainable=False
We set the trainable parameter for all these layers to false as we intend to use a pre-trained model for feature extraction. The model summary is shown in Figure 14-11.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig11_HTML.jpg
Figure 14-11

Pre-trained encoder model summary

Extracting Features

You will extract the features in the training dataset images by using the newmodel that you have created. We just pass each training image through the network and collect the predictions at layer 19.
vggfeatures = []
for sample in x_train:
  sample = gray2rgb(sample)
  sample = sample.reshape((1,224,224,3))
  prediction = newmodel.predict(sample)
  prediction = prediction.reshape((7,7,512))
  vggfeatures.append(prediction)
vggfeatures = np.array(vggfeatures)

Defining Network

Now, you will define our encoder/decoder architecture as follows:
#Encoder
encoder_input = Input(shape=(7, 7, 512,))
#Decoder
decoder_output = Conv2D(256, (3,3),
                  activation='relu', padding="same")
                        (encoder_input)
decoder_output = Conv2D(128, (3,3),
                  activation='relu', padding="same")
                        (decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(16, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation="tanh",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=encoder_input,
                  outputs=decoder_output)
model.summary()

For the encoder, we just specify the input, and the decoder architecture is the same as in the previous example where we keep on upsizing the image and adding colors to it.

The model summary is shown in Figure 14-12.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig12_HTML.jpg
Figure 14-12

Encoder decoder model summary

Model Training

Compile and train the model using the following two statements:
model.compile(optimizer='Adam', loss="mse")
model.fit(vggfeatures, image_a_b_gen(x_train),
            verbose=1, epochs=100, batch_size=128)

We use the Adam optimizer and the mse loss for training. Training the network on a GPU, the epoch time was about a second – a considerable improvement over our earlier network. As we have used a pre-trained encoder, only the decoder parameters need to be trained.

Inference

Now, run the following code to generate the images from the test dataset. The code is trivial enough to understand.
sample = x_test[1:6]
for image in sample:
  lab = rgb2lab(image)
  l = lab[:,:,0]
  L = gray2rgb(l)
  L = L.reshape((1,224,224,3))
  vggpred = newmodel.predict(L)
  ab = model.predict(vggpred)
  ab = ab*128
  cur = np.zeros((224, 224, 3))
  cur[:,:,0] = l
  cur[:,:,1:] = ab
  plt.subplot(1,2,1)
  plt.title("Generated Image")
  plt.imshow( lab2rgb(cur))
  plt.axis('off')
  plt.subplot(1,2,2)
  plt.title("Original Image")
  plt.imshow(image)
  plt.axis('off')
  plt.show()
The output of the preceding code is shown in Figure 14-13.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig13_HTML.jpg
Figure 14-13

Model inference on test images

Inference on an Unseen Image

Like the earlier example, test the model’s performance on an unseen image of your choice. We will use the same image as in the earlier example.
!wget https://raw.githubusercontent.com/Apress/artificial-neural-networks-with-tensorflow-2/main/ch14/mountain.jpg
Display the original image if you wish to see it again.
img = imread("mountain.jpg")
plt.imshow(img)
Now, run the inference using the following code. Note that we need to change the image size to 224x224 before inputting the image to the network.
test = img_to_array(load_img("mountain.jpg"))
test = resize(test, (224,224), anti_aliasing=True)
test*= 1.0/255
lab = rgb2lab(test)
l = lab[:,:,0]
L = gray2rgb(l)
L = L.reshape((1,224,224,3))
vggpred = newmodel.predict(L)
ab = model.predict(vggpred)
ab = ab*128
cur = np.zeros((224, 224, 3))
cur[:,:,0] = l
cur[:,:,1:] = ab
plt.imshow( lab2rgb(cur))
plt.axis('off')
The program output is shown in Figure 14-14.
../images/495303_1_En_14_Chapter/495303_1_En_14_Fig14_HTML.jpg
Figure 14-14

A colorized image generated by the autoencoder transfer learning model

Full Source

The full source is given in Listing 14-2 for your reference.
import numpy as np
import pandas as pd
import cv2
import os
import sys
import matplotlib.pyplot as plt
from tqdm import tqdm
from itertools import chain
import skimage
from PIL import Image
from skimage.io import imread, imshow,
        imread_collection, concatenate_images
from skimage.transform import resize
from skimage.util import crop, pad
from skimage.morphology import label
from skimage.color import rgb2gray, gray2rgb,
        rgb2lab, lab2rgb
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing.image
        import load_img
from tensorflow.keras.preprocessing.image
        import img_to_array
from tensorflow.keras.applications.vgg16
        import preprocess_input
import tensorflow as tf
from tensorflow.keras.models
        import Model, load_model,Sequential
from tensorflow.keras.preprocessing.image
        import ImageDataGenerator
from tensorflow.keras.layers import Input, Dense,
        UpSampling2D, RepeatVector, Reshape
from tensorflow.keras.layers import Dropout, Lambda
from tensorflow.keras.layers
        import Conv2D, Conv2DTranspose
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import concatenate
from tensorflow.keras import backend as K
#!pip install -q kaggle
#!mkdir ~/.kaggle
#!touch ~/.kaggle/kaggle.json
#api_token = {"username":"","key":""}
#import json
#with open('/root/.kaggle/kaggle.json', 'w') as file:
#    json.dump(api_token, file)
#!chmod 600 ~/.kaggle/kaggle.json
#!kaggle datasets download -d thedownhill/art-images-drawings-painting-sculpture-engraving
!wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1CKs7s_MZMuZFBXDchcL_AgmCxgPBTJXK' -O art-images-drawings-painting-sculpture-engraving.zip
!unzip art-images-drawings-painting-sculpture-engraving.zip
IMG_WIDTH = 224
IMG_HEIGHT = 224
TRAIN_PATH =
'/content/dataset/dataset_updated/training_set/painting/'
train_ids = next(os.walk(TRAIN_PATH))[2]
missing_count = 0
for n, id_ in tqdm(enumerate(train_ids),
                        total=len(train_ids)):
    path = TRAIN_PATH + id_+''
    try:
        img = imread(path)
    except:
        missing_count += 1
print(" Total missing: "+ str(missing_count))
X_train = np.zeros((len(train_ids)-missing_count,
          IMG_HEIGHT, IMG_WIDTH, 3), dtype=np.uint8)
missing_images = 0
for n, id_ in tqdm(enumerate(train_ids),
                        total=len(train_ids)):
    path = TRAIN_PATH + id_+''
    try:
        img = imread(path)
        img = resize(img, (IMG_HEIGHT, IMG_WIDTH),
                            mode='constant',
                            preserve_range=True)
        X_train[n-missing_images] = img
    except:
        missing_images += 1
X_train = X_train.astype('float32') / 255.
plt.imshow(X_train[5])
x_train, x_test = train_test_split
                        (X_train, test_size=1500)
datagen = ImageDataGenerator(
        shear_range=0.2,
        zoom_range=0.2,
        rotation_range=20,
        horizontal_flip=True)
def image_a_b_gen(dataset=X_train):
    # iteration for every image
    for batch in datagen.flow(dataset, batch_size=542):
        # convert from rgb to grayscale
        X_batch = rgb2gray(batch)
        # convert the rgb to Lab format
        lab_batch = rgb2lab(batch)
        X_batch = lab_batch[:,:,:,1:] /128
        return X_batch
vggmodel = tf.keras.applications.vgg16.VGG16()
newmodel = Sequential()
num = 0
for i, layer in enumerate(vggmodel.layers):
    if i<19:
      newmodel.add(layer)
newmodel.summary()
for layer in newmodel.layers:
  layer.trainable=False
vggfeatures = []
for sample in x_train:
  sample = gray2rgb(sample)
  sample = sample.reshape((1,224,224,3))
  prediction = newmodel.predict(sample)
  prediction = prediction.reshape((7,7,512))
  vggfeatures.append(prediction)
vggfeatures = np.array(vggfeatures)
#Encoder
encoder_input = Input(shape=(7, 7, 512,))
#Decoder
decoder_output = Conv2D(256, (3,3),
                  activation='relu', padding="same")
                        (encoder_input)
decoder_output = Conv2D(128, (3,3),
                  activation='relu', padding="same")
                        (decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(16, (3,3), activation="relu",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation="tanh",
                  padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=encoder_input,
                  outputs=decoder_output)
model.summary()
model.compile(optimizer='Adam', loss="mse")
model.fit(vggfeatures, image_a_b_gen(x_train),
            verbose=1, epochs=100, batch_size=128)
sample = x_test[1:6]
for image in sample:
  lab = rgb2lab(image)
  l = lab[:,:,0]
  L = gray2rgb(l)
  L = L.reshape((1,224,224,3))
  vggpred = newmodel.predict(L)
  ab = model.predict(vggpred)
  ab = ab*128
  cur = np.zeros((224, 224, 3))
  cur[:,:,0] = l
  cur[:,:,1:] = ab
  plt.subplot(1,2,1)
  plt.title("Generated Image")
  plt.imshow( lab2rgb(cur))
  plt.axis('off')
  plt.subplot(1,2,2)
  plt.title("Original Image")
  plt.imshow(image)
  plt.axis('off')
  plt.show()
!wget https://raw.githubusercontent.com/Apress/artificial-neural-networks-with-tensorflow-2/main/ch14/mountain.jpg
img = imread("mountain.jpg")
plt.imshow(img)
test = img_to_array(load_img("mountain.jpg"))
test = resize(test, (224,224), anti_aliasing=True)
test*= 1.0/255
lab = rgb2lab(test)
l = lab[:,:,0]
L = gray2rgb(l)
L = L.reshape((1,224,224,3))
vggpred = newmodel.predict(L)
ab = model.predict(vggpred)
ab = ab*128
cur = np.zeros((224, 224, 3))
cur[:,:,0] = l
cur[:,:,1:] = ab
plt.imshow( lab2rgb(cur))
plt.axis('off')
Listing 14-2

AutoEncoder_TransferLearning

Summary

The use of deep neural networks makes it possible to add colors to a B&W image. You learned to create AutoEncoders, which we used for colorizing B&W images. The AutoEncoder contains an Encoder to extract the image features, and the Decoder recreates the image using the representations extracted from the encoder. You also learned how to use a pre-trained image classifier to extract the image features for using it as a part of the Encoder.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.103.219