Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

P. SarangArtificial Neural Networks with TensorFlow 2https://doi.org/10.1007/978-1-4842-6150-7_14

14. Image Translation

Poornachandra Sarang¹

(1)

Mumbai, India

Have you ever thought of colorizing the old B&W photograph of your granny? You would probably approach a Photoshop artist to do this job for you, paying them hefty fees and awaiting a couple of days/weeks for them to finish the job. If I tell you that you could do this with a deep neural network, would you not get excited about learning how to do it? Well, this chapter teaches you the technique of converting your B&W images to a colorized image almost instantaneously. The technique is simple and uses a network architecture known as AutoEncoders. So, let us first look at AutoEncoders.

AutoEncoders

AutoEncoders consist of two parts – an Encoder and a Decoder. It may be schematically represented as shown in Figure 14-1.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig1_HTML.jpg — Figure 14-1
AutoEncoder architecture

On the left, we have a B&W image fed to our network. On the right, where we have the network output, we have a colorized image of the same input content. What goes in between can be described like this. The Encoder processes the image through a series of Convolutional layers and downsizes the image to learn the reduced dimensional representation of the input image. The decoder then attempts to regenerate the image by passing it through another series of Convolutional layers, upsizing and adding colors in the process.

Now, to understand how to colorize an image, you must first understand the color spaces.

Color Spaces

A color image consists of the colors in a given color space and the luminous intensity. A range of colors are created by using the primary colors such as red, green, blue. This entire range of colors is then called a color space, for example, RGB. In mathematical terms, a color space is an abstract mathematical model that simply describes the range of colors as tuples of numbers. Each color is represented by a single dot.

I will describe the three most popular color spaces:

RGB
YCbCr
Lab

RGB is the most commonly used color space. It contains three channels – red (R), green (G), and blue (B). Each channel is represented by 8 bits and can take a maximum value of 256. Combined together, they can represent over 16 million colors.

The JPEG and MPEG formats use the YCbCr color space. It is more efficient for digital transmission and storage as compared to RGB. The Y channel represents the luminosity of a grayscale image. The Cb and Cr represent the blue and red difference chroma components. The Y channel takes values from 16 through 235. The Cb and Cr values range from 16 to 240. Be careful, the combined value of all these channels may not represent a valid color. In our application of colorization, we don't use this color space.

The Lab color space was designed by the International Commission on Illumination (CIE). A visual representation of this color space is shown in Figure 14-2.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig2_HTML.jpg — Figure 14-2
The Lab color space

The Lab color space is larger than the gamut of computer displays and printers. Then, a bitmap image represented as a Lab requires more data per pixel to obtain the same precision as an RGB or CMYK. Thus, the Lab color space is typically used as an intermediary rather than an end color space.

The L channel represents the luminosity and takes values in the range 0 to 100. The “a” channel codes from green (-) to red (+), and the “b” channel codes from blue (-) to yellow (+). For an 8-bit implementation, both take values in the range -127 to +127. The Lab color space approximates the human vision. The amount of numerical change in these component values corresponds to roughly the same amount of visually perceived change.

We use the Lab color space in our project. By separating the grayscale component which represents the luminosity, the network has to learn only two remaining channels for colorization. This helps in reducing the network size and results in faster convergence.

I will now discuss the different network topologies for our AutoEncoders.

Network Configurations

The AutoEncoder network may be configured in three different ways:

Vanilla
Merged
Merged model using pre-trained network

I will now discuss all three models.

Vanilla Model

The vanilla model has the configuration shown in Figure 14-1, where the Encoder has a series of Convolutional layers with strides for downsizing the image and extracting features. The Decoder too has Convolutional layers which are used for upsizing and colorization. In such autoencoders, the encoders are not deep enough to extract the global features of an image. The global features help us in determining how to colorize certain regions of the image. If we make the encoder network deep, the dimensions of the representation would be too small for the decoder to faithfully reproduce the original image. Thus, we need two paths in an Encoder – one to obtain the global features and the other one to obtain a rich representation of the image. This is what is done in the next two models. You will be constructing a vanilla network for the first project in this chapter .

Merged Model

This model was proposed by Lizuka et al. in their paper “Let there be Color!” (http://iizuka.cs.tsukuba.ac.jp/projects/colorization/data/colorization_sig2016.pdf). The model architecture is shown in Figure 14-3.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig3_HTML.jpg — Figure 14-3
Merged model architecture

An eight-layer Encoder was used to extract the mid-level representations. The output of the sixth layer is forked and fed through another seven-layer network to extract the global features. Another Fusion network then concatenates the two outputs and feeds them to the decoder.

Merged Model Using Pre-trained Network

This was proposed by Baldassarre et al. in their paper “Deep Koalarization: Image Colorization using CNNs and Inception-Resnet-v2” (https://arxiv.org/pdf/1712.03400.pdf). The schematic diagram of the model architecture is shown in Figure 14-4.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig4_HTML.jpg — Figure 14-4
Model using a pre-trained network

The feature extraction is done by a pre-trained ResNet.

In the second project in this chapter, I will show you how to use a pre-trained model for feature extraction, though you will not be constructing as complicated a model as shown in this schematic.

With this introduction to AutoEncoders and their configurations, let us start with some practical implementations of them.

AutoEncoder

In this project, you will be using the vanilla autoencoder.

Open a new Colab notebook and rename it to AutoEncoder – Custom. Add the following imports:

import numpy as np

import pandas as pd

import os

import matplotlib.pyplot as plt

from tqdm import tqdm

from itertools import chain

import skimage

from skimage.io import imread, imshow

from skimage.transform import resize

from skimage.util import crop, pad

from skimage.morphology import label

from skimage.color import rgb2gray, gray2rgb,

rgb2lab, lab2rgb

from sklearn.model_selection import train_test_split

import tensorflow as tf

from tensorflow.keras.models

import Model, load_model,Sequential

from tensorflow.keras.preprocessing.image

import ImageDataGenerator

from tensorflow.keras.layers import Input, Dense,

UpSampling2D, RepeatVector, Reshape

from tensorflow.keras.layers import Dropout, Lambda

from tensorflow.keras.layers import Conv2D,

Conv2DTranspose

from tensorflow.keras.layers import MaxPooling2D

from tensorflow.keras import backend as K

Loading Data

You will be using the dataset provided on the Kaggle site for this project. The site (www.kaggle.com/thedownhill/art-images-drawings-painting-sculpture-engraving) provides a dataset of about 9000 images containing five types of arts. If you have a Kaggle account, you may download the dataset using your credentials with the following code:

#!pip install -q kaggle

#!mkdir ~/.kaggle

#!touch ~/.kaggle/kaggle.json

#api_token = {"username":"Your UserName",

"key":"Your key"}

#import json

#with open('/root/.kaggle/kaggle.json', 'w') as file:

# json.dump(api_token, file)

#!chmod 600 ~/.kaggle/kaggle.json

#!kaggle datasets download -d thedownhill/art-images-drawings-painting-sculpture-engraving

Optionally, the data is also available on the book’s download site and can be downloaded into your project using wget as in the following code fragment:

!wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1CKs7s_MZMuZFBXDchcL_AgmCxgPBTJXK' -O art-images-drawings-painting-sculpture-engraving.zip

After the data file is downloaded, unzip it to your drive using the unzip utility:

!unzip art-images-drawings-painting-sculpture-engraving.zip

When the file is unzipped, you will have lots of images stored on your drive arranged in a specific folder structure. The images have varied sizes. We will convert all our training images to a fixed size of 256x256. We define a few variables for creating our training dataset as follows:

IMG_WIDTH = 256

IMG_HEIGHT = 256

TRAIN_PATH =

'/content/dataset/dataset_updated/training_set/painting/'

train_ids = next(os.walk(TRAIN_PATH))[2]

The os.walk gets all the filenames present in the folder.

We will first check if there are any bad images (unreadable) in the code and remove those from our dataset, though this step is not truly required for our purpose.

missing_count = 0

for n, id_ in tqdm(enumerate(train_ids),

total=len(train_ids)):

path = TRAIN_PATH + id_+''

try:

img = imread(path)

except:

missing_count += 1

print(" Total missing: "+ str(missing_count))

When you run this code, you will discover that there are 86 bad images in the set.

Now, we will create the training set taking care to remove the bad images.

X_train = np.zeros((len(train_ids)-missing_count,

IMG_HEIGHT, IMG_WIDTH, 3), dtype=np.uint8)

missing_images = 0

for n, id_ in tqdm(enumerate(train_ids),

total=len(train_ids)):

path = TRAIN_PATH + id_+''

try:

img = imread(path)

img = resize(img, (IMG_HEIGHT, IMG_WIDTH),

mode='constant', preserve_range=True)

X_train[n-missing_images] = img

except:

missing_images += 1

X_train = X_train.astype('float32') / 255.

You may now examine how the image looks like using the following statement:

plt.imshow(X_train[5])

The output is shown in Figure 14-5.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig5_HTML.jpg — Figure 14-5
A sample image

Creating Training/Testing Datasets

We will just reserve a few images for testing from the dataset that we have created.

x_train, x_test = train_test_split(X_train,

test_size=20)

The train_test_split method as specified in the test_size parameter reserves 20 images for testing.

Preparing Training Dataset

For training the model, we will convert the images from RGB to Lab format. As said earlier, the L channel is the grayscale. It represents the luminance of the image. The “a” is the color balance between green and red, and the “b” is the color balance between blue and yellow.

First, we will create an instance of ImageDataGenerator from the Keras library to convert the images into an array of pixels and finally combine them into a giant vector.

datagen = ImageDataGenerator(

shear_range=0.2,

zoom_range=0.2,

rotation_range=20,

horizontal_flip=True)

If each image is skewed, the model will learn better. The shear_range tilts the image to the left or right, and the other parameters zoom, rotation, and horizontal flip have their respective meanings.

Now, we will write a function for creating batches of data for training. The function definition is given as follows:

def create_training_batches(dataset=X_train,

batch_size = 20):

# iteration for every image

for batch in datagen.flow(dataset, batch_size=batch_size):

# convert from rgb to grayscale

X_batch = rgb2gray(batch)

# convert rgb to Lab format

lab_batch = rgb2lab(batch)

# extract L component

X_batch = lab_batch[:,:,:,0]

# reshape

X_batch = X_batch.reshape(X_batch.shape+(1,))

# extract a and b features of the image

Y_batch = lab_batch[:,:,:,1:] / 128

yield X_batch, Y_batch

The function first converts the given image from RGB to grayscale by calling the rgb2gray method. The image is then converted to Lab format by calling the rgb2lab method. If we take Lab color space, we need to predict only two components as compared to other color spaces where we would need to predict three or four components. As said earlier, this helps in reducing the network size and results in a faster convergence. Finally, we extract the L, a, and b components from the image.

Defining Model

Now, we will define our Autoencoder model. The model configuration is based on the suggestions made in the paper “Let there be Color!” (http://iizuka.cs.tsukuba.ac.jp/projects/colorization/data/colorization_sig2016.pdf).

# the input for the encoder layer

inputs1 = Input(shape=(IMG_WIDTH, IMG_HEIGHT, 1,))

# encoder

# Using Conv2d to reduce the size of feature maps and image size

# convert image to 128x128

encoder_output = Conv2D(64, (3,3), activation="relu",

padding='same', strides=2)(inputs1)

encoder_output = Conv2D(128, (3,3),

activation='relu',

padding='same')(encoder_output)

# convert image to 64x64

encoder_output = Conv2D(128, (3,3),

activation='relu', padding="same",

strides=2)(encoder_output)

encoder_output = Conv2D(256, (3,3),

activation='relu',

padding='same')(encoder_output)

# convert image to 32x32

encoder_output = Conv2D(256, (3,3),

activation='relu', padding="same",

strides=2)(encoder_output)

encoder_output = Conv2D(512, (3,3),

activation='relu', padding="same")

(encoder_output)

# mid-level feature extractions

encoder_output = Conv2D(512, (3,3),

activation='relu',

padding='same')(encoder_output)

encoder_output = Conv2D(256, (3,3),

activation='relu',

padding='same')(encoder_output)

# decoder

# Adding colors to the grayscale image and upsizing it

decoder_output = Conv2D(128, (3,3),

activation='relu',

padding='same')(encoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

# image size 64x64

decoder_output = Conv2D(64, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = Conv2D(64, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

# image size 128x128

decoder_output = Conv2D(32, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = Conv2D(2, (3, 3), activation="tanh",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

# image size 256x256

Both the Encoder and Decoder contain few Conv2D layers. The Encoder through a series of layers downsamples the image to extract its features, and the Decoder through its own set of layers attempts to regenerate the original image using upsampling at various points and adding colors to the grayscale image to create a final image of size 256x256. The last decoder layer uses tanh activation for squashing the values between –1 and +1. Remember that we had earlier normalized the a and b values in the range –1 through +1.

After the Encoder and Decoder layers are defined, construct the model and compile it using its compile method. We use mse for the loss function and Adam optimizer.

model = Model(inputs=inputs1, outputs=decoder_output)

model.compile(loss='mse', optimizer="adam",

metrics=['accuracy'])

print(model.summary())

The model summary is shown in Figure 14-6.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig6_HTML.jpg — Figure 14-6
Autoencoder model summary

You can get the visualization by plotting the model:

tf.keras.utils.plot_model(model)

The output is shown in Figure 14-7.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig7_HTML.jpg — Figure 14-7
Model plot

Model Training

We train the model by calling its fit method.

BATCH_SIZE = 20

model.fit_generator(create_training_batches

(X_train,BATCH_SIZE),

epochs= 100,

verbose=1,

steps_per_epoch=X_train.shape[0]/BATCH_SIZE)

It took me slightly over a minute per epoch to train the model on a GPU. By using the pre-trained model, this training time came down to about a second per epoch as you would see when you run the second project in this chapter.

Testing

Now, you can check the model performance on the test dataset that we have created earlier. Note that for the test images, we do not skew them as we did for the training. We simply convert the images to Lab format and do the prediction. Here is the code for model predictions on test images.

test_image = rgb2lab(x_test)[:,:,:,0]

test_image = test_image.reshape

(test_image.shape+(1,))

output = model.predict(test_image)

output = output * 128

# making the output image array

generated_images = np.zeros

((len(output),256, 256, 3))

for i in range(len(output)):

#iterating for the output

cur = np.zeros((256, 256, 3))

# dummy array

cur[:,:,0] = test_image[i][:,:,0]

#assigning the gray scale component

cur[:,:,1:] = output[i]

#assigning the a and b component

#converting from lab to rgb format as plt only work for rgb mode

generated_images[i] = lab2rgb(cur)

Display the generated images along with the originals using the following code fragment:

plt.figure(figsize=(20, 6))

for i in range(10):

# grayscale

plt.subplot(3, 10, i + 1)

plt.imshow(rgb2gray(x_test)[i].reshape(256, 256))

plt.gray()

plt.axis('off')

# recolorization

plt.subplot(3, 10, i + 1 +10)

plt.imshow(generated_images[i].reshape

(256, 256,3))

plt.axis('off')

# original

plt.subplot(3, 10, i + 1 + 20)

plt.imshow(x_test[i].reshape(256, 256,3))

plt.axis('off')

plt.tight_layout()

plt.show()

The output is shown in Figure 14-8.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig8_HTML.jpg — Figure 14-8
Model inference

The first row is the set of grayscale images created from the original color images given in the third row. The middle row shows the images generated by the model. As you can see, the model is able to generate the images close enough to the original images.

Now, I will show you how to use this model on an unseen image of a different size.

Inference on an Unseen Image

You can test the model’s performance on an unseen image of your choice. A sample image is available on the book’s site, which can be downloaded using wget.

!wget https://raw.githubusercontent.com/Apress/artificial-neural-networks-with-tensorflow-2/main/ch14/mountain.jpg

Display the original image:

img = imread("mountain.jpg")

plt.imshow(img)

The image is shown in Figure 14-9.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig9_HTML.jpg — Figure 14-9
Sample image with different dimensions

Now, run the inference using the following code. Note that we need to change the image size before inputting the image to the network.

img = resize(img, (IMG_HEIGHT, IMG_WIDTH),

mode='constant', preserve_range=True)

img = img.astype('float32') / 255.

test_image = rgb2lab(img)[:,:,0]

test_image = test_image.reshape

((1,)+test_image.shape+(1,))

output = model.predict(test_image)

output = output * 128

plt.imshow(img)

plt.axis('off')

The generated image is shown in Figure 14-10.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig10_HTML.jpg — Figure 14-10
A colorized image generated by the custom autoencoder model

Full Source

The full source is given in Listing 14-1 for your reference.

import numpy as np

import pandas as pd

import os

import matplotlib.pyplot as plt

from tqdm import tqdm

from itertools import chain

import skimage

from skimage.io import imread, imshow

from skimage.transform import resize

from skimage.util import crop, pad

from skimage.morphology import label

from skimage.color import rgb2gray, gray2rgb,

rgb2lab, lab2rgb

from sklearn.model_selection import train_test_split

import tensorflow as tf

from tensorflow.keras.models

import Model, load_model,Sequential

from tensorflow.keras.preprocessing.image

import ImageDataGenerator

from tensorflow.keras.layers import Input, Dense,

UpSampling2D, RepeatVector, Reshape

from tensorflow.keras.layers import Dropout, Lambda

from tensorflow.keras.layers import Conv2D,

Conv2DTranspose

from tensorflow.keras.layers import MaxPooling2D

from tensorflow.keras import backend as K

#!pip install -q kaggle

#!mkdir ~/.kaggle

#!touch ~/.kaggle/kaggle.json

#api_token = {"username":"Your UserName",

"key":"Your key"}

#import json

#with open('/root/.kaggle/kaggle.json', 'w') as file:

# json.dump(api_token, file)

#!chmod 600 ~/.kaggle/kaggle.json

#!kaggle datasets download -d thedownhill/art-images-drawings-painting-sculpture-engraving

!wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1CKs7s_MZMuZFBXDchcL_AgmCxgPBTJXK' -O art-images-drawings-painting-sculpture-engraving.zip

!unzip art-images-drawings-painting-sculpture-engraving.zip

IMG_WIDTH = 256

IMG_HEIGHT = 256

TRAIN_PATH =

'/content/dataset/dataset_updated/training_set/painting/'

train_ids = next(os.walk(TRAIN_PATH))[2]

missing_count = 0

for n, id_ in tqdm(enumerate(train_ids),

total=len(train_ids)):

path = TRAIN_PATH + id_+''

try:

img = imread(path)

except:

missing_count += 1

print(" Total missing: "+ str(missing_count))

X_train = np.zeros((len(train_ids)-missing_count,

IMG_HEIGHT, IMG_WIDTH, 3), dtype=np.uint8)

missing_images = 0

for n, id_ in tqdm(enumerate(train_ids),

total=len(train_ids)):

path = TRAIN_PATH + id_+''

try:

img = imread(path)

img = resize(img, (IMG_HEIGHT, IMG_WIDTH),

mode='constant', preserve_range=True)

X_train[n-missing_images] = img

except:

missing_images += 1

X_train = X_train.astype('float32') / 255.

plt.imshow(X_train[5])

x_train, x_test = train_test_split(X_train,

test_size=20)

datagen = ImageDataGenerator(

shear_range=0.2,

zoom_range=0.2,

rotation_range=20,

horizontal_flip=True)

def create_training_batches(dataset=X_train,

batch_size = 20):

# iteration for every image

for batch in datagen.flow(dataset, batch_size=batch_size):

# convert from rgb to grayscale

X_batch = rgb2gray(batch)

# convert rgb to Lab format

lab_batch = rgb2lab(batch)

# extract L component

X_batch = lab_batch[:,:,:,0]

# reshape

X_batch = X_batch.reshape(X_batch.shape+(1,))

# extract a and b features of the image

Y_batch = lab_batch[:,:,:,1:] / 128

yield X_batch, Y_batch

# the input for the encoder layer

inputs1 = Input(shape=(IMG_WIDTH, IMG_HEIGHT, 1,))

# encoder

# Using Conv2d to reduce the size of feature maps and image size

# convert image to 128x128

encoder_output = Conv2D(64, (3,3), activation="relu",

padding='same', strides=2)(inputs1)

encoder_output = Conv2D(128, (3,3),

activation='relu',

padding='same')(encoder_output)

# convert image to 64x64

encoder_output = Conv2D(128, (3,3),

activation='relu', padding="same",

strides=2)(encoder_output)

encoder_output = Conv2D(256, (3,3),

activation='relu',

padding='same')(encoder_output)

# convert image to 32x32

encoder_output = Conv2D(256, (3,3),

activation='relu', padding="same",

strides=2)(encoder_output)

encoder_output = Conv2D(512, (3,3),

activation='relu', padding="same")

(encoder_output)

# mid-level feature extractions

encoder_output = Conv2D(512, (3,3),

activation='relu',

padding='same')(encoder_output)

encoder_output = Conv2D(256, (3,3),

activation='relu',

padding='same')(encoder_output)

# decoder

# Adding colors to the grayscale image and upsizing it

decoder_output = Conv2D(128, (3,3),

activation='relu',

padding='same')(encoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

# image size 64x64

decoder_output = Conv2D(64, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = Conv2D(64, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

# image size 128x128

decoder_output = Conv2D(32, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = Conv2D(2, (3, 3), activation="tanh",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

# image size 256x256

# compiling model

model = Model(inputs=inputs1, outputs=decoder_output)

model.compile(loss='mse', optimizer="adam",

metrics=['accuracy'])

print(model.summary())

tf.keras.utils.plot_model(model)

BATCH_SIZE = 20

model.fit_generator(create_training_batches

(X_train,BATCH_SIZE),

epochs= 100,

verbose=1,

steps_per_epoch=X_train.shape[0]/BATCH_SIZE)

test_image = rgb2lab(x_test)[:,:,:,0]

test_image = test_image.reshape

(test_image.shape+(1,))

output = model.predict(test_image)

output = output * 128

# making the output image array

generated_images = np.zeros

((len(output),256, 256, 3))

for i in range(len(output)):

#iterating for the output

cur = np.zeros((256, 256, 3))

# dummy array

cur[:,:,0] = test_image[i][:,:,0]

#assigning the gray scale component

cur[:,:,1:] = output[i]

#assigning the a and b component

#converting from lab to rgb format as plt only work for rgb mode

generated_images[i] = lab2rgb(cur)

plt.figure(figsize=(20, 6))

for i in range(10):

# grayscale

plt.subplot(3, 10, i + 1)

plt.imshow(rgb2gray(x_test)[i].reshape(256, 256))

plt.gray()

plt.axis('off')

# recolorization

plt.subplot(3, 10, i + 1 +10)

plt.imshow(generated_images[i].reshape

(256, 256,3))

plt.axis('off')

# original

plt.subplot(3, 10, i + 1 + 20)

plt.imshow(x_test[i].reshape(256, 256,3))

plt.axis('off')

plt.tight_layout()

plt.show()

!wget https://raw.githubusercontent.com/Apress/artificial-neural-networks-with-tensorflow-2/main/ch14/mountain.jpg

img = imread("mountain.jpg")

plt.imshow(img)

img = resize(img, (IMG_HEIGHT, IMG_WIDTH),

mode='constant', preserve_range=True)

img = img.astype('float32') / 255.

test_image = rgb2lab(img)[:,:,0]

test_image = test_image.reshape

((1,)+test_image.shape+(1,))

output = model.predict(test_image)

output = output * 128

plt.imshow(img)

plt.axis('off')

Listing 14-1

AutoEncoder_Custom

Now, I will show you how to use a pre-trained model for features extraction, thereby saving you a lot of training time and giving better feature extraction.

Pre-trained Model as Encoder

There are several pre-trained models available for image processing. You have used one such VGG16 model in Chapter 12. The use of this model allows you to extract the image features, and that is what we did in our previous program by creating our own encoder. So why not use the transfer learning by using a VGG16 pre-trained model in place of an encoder? And that is what I am going to demonstrate in this application. The use of a pre-trained model would certainly provide better results as compared to your own defined encoder and a faster training too.

Project Description

You will be using the same image dataset as in the previous project. Thus, the data loading and preprocessing code would remain the same. What changes is the model definition and the inference. So, I will describe only the relevant changes. The entire project source is available in the book’s download site and also given at the end of this section for your quick reference. The project is named AutoEncoder-TransferLearning.

As the VGG16 was trained on images of size 224x224, you will need to change those two constant values to the following:

IMG_WIDTH = 224

IMG_HEIGHT = 224

Defining Model

You have already seen the VGG16 architecture in Chapter 12 (Figure 12-5). The first 18 layers of the VGG model extract the image features. So, we will use these layers and discard all subsequent layers. We create a new sequential model using the following code snippet:

vggmodel = tf.keras.applications.vgg16.VGG16()

newmodel = Sequential()

num = 0

for i, layer in enumerate(vggmodel.layers):

if i<19:

newmodel.add(layer)

newmodel.summary()

for layer in newmodel.layers:

layer.trainable=False

We set the trainable parameter for all these layers to false as we intend to use a pre-trained model for feature extraction. The model summary is shown in Figure 14-11.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig11_HTML.jpg — Figure 14-11
Pre-trained encoder model summary

Extracting Features

You will extract the features in the training dataset images by using the newmodel that you have created. We just pass each training image through the network and collect the predictions at layer 19.

vggfeatures = []

for sample in x_train:

sample = gray2rgb(sample)

sample = sample.reshape((1,224,224,3))

prediction = newmodel.predict(sample)

prediction = prediction.reshape((7,7,512))

vggfeatures.append(prediction)

vggfeatures = np.array(vggfeatures)

Defining Network

Now, you will define our encoder/decoder architecture as follows:

#Encoder

encoder_input = Input(shape=(7, 7, 512,))

#Decoder

decoder_output = Conv2D(256, (3,3),

activation='relu', padding="same")

(encoder_input)

decoder_output = Conv2D(128, (3,3),

activation='relu', padding="same")

(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

decoder_output = Conv2D(64, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

decoder_output = Conv2D(32, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

decoder_output = Conv2D(16, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

decoder_output = Conv2D(2, (3, 3), activation="tanh",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

model = Model(inputs=encoder_input,

outputs=decoder_output)

model.summary()

For the encoder, we just specify the input, and the decoder architecture is the same as in the previous example where we keep on upsizing the image and adding colors to it.

The model summary is shown in Figure 14-12.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig12_HTML.jpg — Figure 14-12
Encoder decoder model summary

Model Training

Compile and train the model using the following two statements:

model.compile(optimizer='Adam', loss="mse")

model.fit(vggfeatures, image_a_b_gen(x_train),

verbose=1, epochs=100, batch_size=128)

We use the Adam optimizer and the mse loss for training. Training the network on a GPU, the epoch time was about a second – a considerable improvement over our earlier network. As we have used a pre-trained encoder, only the decoder parameters need to be trained.

Inference

Now, run the following code to generate the images from the test dataset. The code is trivial enough to understand.

sample = x_test[1:6]

for image in sample:

lab = rgb2lab(image)

l = lab[:,:,0]

L = gray2rgb(l)

L = L.reshape((1,224,224,3))

vggpred = newmodel.predict(L)

ab = model.predict(vggpred)

ab = ab*128

cur = np.zeros((224, 224, 3))

cur[:,:,0] = l

cur[:,:,1:] = ab

plt.subplot(1,2,1)

plt.title("Generated Image")

plt.imshow( lab2rgb(cur))

plt.axis('off')

plt.subplot(1,2,2)

plt.title("Original Image")

plt.imshow(image)

plt.axis('off')

plt.show()

The output of the preceding code is shown in Figure 14-13.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig13_HTML.jpg — Figure 14-13
Model inference on test images

Inference on an Unseen Image

Like the earlier example, test the model’s performance on an unseen image of your choice. We will use the same image as in the earlier example.

!wget https://raw.githubusercontent.com/Apress/artificial-neural-networks-with-tensorflow-2/main/ch14/mountain.jpg

Display the original image if you wish to see it again.

img = imread("mountain.jpg")

plt.imshow(img)

Now, run the inference using the following code. Note that we need to change the image size to 224x224 before inputting the image to the network.

test = img_to_array(load_img("mountain.jpg"))

test = resize(test, (224,224), anti_aliasing=True)

test*= 1.0/255

lab = rgb2lab(test)

l = lab[:,:,0]

L = gray2rgb(l)

L = L.reshape((1,224,224,3))

vggpred = newmodel.predict(L)

ab = model.predict(vggpred)

ab = ab*128

cur = np.zeros((224, 224, 3))

cur[:,:,0] = l

cur[:,:,1:] = ab

plt.imshow( lab2rgb(cur))

plt.axis('off')

The program output is shown in Figure 14-14.

../images/495303_1_En_14_Chapter/495303_1_En_14_Fig14_HTML.jpg — Figure 14-14
A colorized image generated by the autoencoder transfer learning model

Full Source

The full source is given in Listing 14-2 for your reference.

import numpy as np

import pandas as pd

import cv2

import os

import sys

import matplotlib.pyplot as plt

from tqdm import tqdm

from itertools import chain

import skimage

from PIL import Image

from skimage.io import imread, imshow,

imread_collection, concatenate_images

from skimage.transform import resize

from skimage.util import crop, pad

from skimage.morphology import label

from skimage.color import rgb2gray, gray2rgb,

rgb2lab, lab2rgb

from sklearn.model_selection import train_test_split

from tensorflow.keras.applications.vgg16 import VGG16

from tensorflow.keras.preprocessing.image

import load_img

from tensorflow.keras.preprocessing.image

import img_to_array

from tensorflow.keras.applications.vgg16

import preprocess_input

import tensorflow as tf

from tensorflow.keras.models

import Model, load_model,Sequential

from tensorflow.keras.preprocessing.image

import ImageDataGenerator

from tensorflow.keras.layers import Input, Dense,

UpSampling2D, RepeatVector, Reshape

from tensorflow.keras.layers import Dropout, Lambda

from tensorflow.keras.layers

import Conv2D, Conv2DTranspose

from tensorflow.keras.layers import MaxPooling2D

from tensorflow.keras.layers import concatenate

from tensorflow.keras import backend as K

#!pip install -q kaggle

#!mkdir ~/.kaggle

#!touch ~/.kaggle/kaggle.json

#api_token = {"username":"","key":""}

#import json

#with open('/root/.kaggle/kaggle.json', 'w') as file:

# json.dump(api_token, file)

#!chmod 600 ~/.kaggle/kaggle.json

#!kaggle datasets download -d thedownhill/art-images-drawings-painting-sculpture-engraving

!wget --no-check-certificate -r 'https://drive.google.com/uc?export=download&id=1CKs7s_MZMuZFBXDchcL_AgmCxgPBTJXK' -O art-images-drawings-painting-sculpture-engraving.zip

!unzip art-images-drawings-painting-sculpture-engraving.zip

IMG_WIDTH = 224

IMG_HEIGHT = 224

TRAIN_PATH =

'/content/dataset/dataset_updated/training_set/painting/'

train_ids = next(os.walk(TRAIN_PATH))[2]

missing_count = 0

for n, id_ in tqdm(enumerate(train_ids),

total=len(train_ids)):

path = TRAIN_PATH + id_+''

try:

img = imread(path)

except:

missing_count += 1

print(" Total missing: "+ str(missing_count))

X_train = np.zeros((len(train_ids)-missing_count,

IMG_HEIGHT, IMG_WIDTH, 3), dtype=np.uint8)

missing_images = 0

for n, id_ in tqdm(enumerate(train_ids),

total=len(train_ids)):

path = TRAIN_PATH + id_+''

try:

img = imread(path)

img = resize(img, (IMG_HEIGHT, IMG_WIDTH),

mode='constant',

preserve_range=True)

X_train[n-missing_images] = img

except:

missing_images += 1

X_train = X_train.astype('float32') / 255.

plt.imshow(X_train[5])

x_train, x_test = train_test_split

(X_train, test_size=1500)

datagen = ImageDataGenerator(

shear_range=0.2,

zoom_range=0.2,

rotation_range=20,

horizontal_flip=True)

def image_a_b_gen(dataset=X_train):

# iteration for every image

for batch in datagen.flow(dataset, batch_size=542):

# convert from rgb to grayscale

X_batch = rgb2gray(batch)

# convert the rgb to Lab format

lab_batch = rgb2lab(batch)

X_batch = lab_batch[:,:,:,1:] /128

return X_batch

vggmodel = tf.keras.applications.vgg16.VGG16()

newmodel = Sequential()

num = 0

for i, layer in enumerate(vggmodel.layers):

if i<19:

newmodel.add(layer)

newmodel.summary()

for layer in newmodel.layers:

layer.trainable=False

vggfeatures = []

for sample in x_train:

sample = gray2rgb(sample)

sample = sample.reshape((1,224,224,3))

prediction = newmodel.predict(sample)

prediction = prediction.reshape((7,7,512))

vggfeatures.append(prediction)

vggfeatures = np.array(vggfeatures)

#Encoder

encoder_input = Input(shape=(7, 7, 512,))

#Decoder

decoder_output = Conv2D(256, (3,3),

activation='relu', padding="same")

(encoder_input)

decoder_output = Conv2D(128, (3,3),

activation='relu', padding="same")

(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

decoder_output = Conv2D(64, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

decoder_output = Conv2D(32, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

decoder_output = Conv2D(16, (3,3), activation="relu",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

decoder_output = Conv2D(2, (3, 3), activation="tanh",

padding='same')(decoder_output)

decoder_output = UpSampling2D((2, 2))(decoder_output)

model = Model(inputs=encoder_input,

outputs=decoder_output)

model.summary()

model.compile(optimizer='Adam', loss="mse")

model.fit(vggfeatures, image_a_b_gen(x_train),

verbose=1, epochs=100, batch_size=128)

sample = x_test[1:6]

for image in sample:

lab = rgb2lab(image)

l = lab[:,:,0]

L = gray2rgb(l)

L = L.reshape((1,224,224,3))

vggpred = newmodel.predict(L)

ab = model.predict(vggpred)

ab = ab*128

cur = np.zeros((224, 224, 3))

cur[:,:,0] = l

cur[:,:,1:] = ab

plt.subplot(1,2,1)

plt.title("Generated Image")

plt.imshow( lab2rgb(cur))

plt.axis('off')

plt.subplot(1,2,2)

plt.title("Original Image")

plt.imshow(image)

plt.axis('off')

plt.show()

!wget https://raw.githubusercontent.com/Apress/artificial-neural-networks-with-tensorflow-2/main/ch14/mountain.jpg

img = imread("mountain.jpg")

plt.imshow(img)

test = img_to_array(load_img("mountain.jpg"))

test = resize(test, (224,224), anti_aliasing=True)

test*= 1.0/255

lab = rgb2lab(test)

l = lab[:,:,0]

L = gray2rgb(l)

L = L.reshape((1,224,224,3))

vggpred = newmodel.predict(L)

ab = model.predict(vggpred)

ab = ab*128

cur = np.zeros((224, 224, 3))

cur[:,:,0] = l

cur[:,:,1:] = ab

plt.imshow( lab2rgb(cur))

plt.axis('off')

Listing 14-2

AutoEncoder_TransferLearning

Summary

The use of deep neural networks makes it possible to add colors to a B&W image. You learned to create AutoEncoders, which we used for colorizing B&W images. The AutoEncoder contains an Encoder to extract the image features, and the Decoder recreates the image using the representations extracted from the encoder. You also learned how to use a pre-trained image classifier to extract the image features for using it as a part of the Encoder.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 14. Image Translation

Create new playlist

Sign In

Sign Up

14. Image Translation

AutoEncoders

Color Spaces

Network Configurations

Vanilla Model

Merged Model

Merged Model Using Pre-trained Network

AutoEncoder

Loading Data

Creating Training/Testing Datasets

Preparing Training Dataset

Defining Model

Model Training

Testing

Inference on an Unseen Image

Full Source

Pre-trained Model as Encoder

Project Description

Defining Model

Extracting Features

Defining Network

Model Training

Inference

Inference on an Unseen Image

Full Source

Summary

Table of Contents for
14. Image Translation