Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 6
Cutting‐Edge Deep Learning Projects

Deep Learning is revolutionizing our world with some amazing results by processing images, text, speech, and video. It is extracting knowledge and insights from unstructured data. We saw examples of processing images and text data using Deep Learning in Chapter 5. In this chapter, we take that understanding to the next level and look at some interesting projects. These are innovative solutions folks have developed and shared with the community. These solutions have become very popular due to their unique nature, and you may have read a few news articles on these promoting AI. We will see cool projects like repainting photos in styles of famous painters and generating fake images that look indistinguishable from the real ones. We will see an example of detecting fraud in credit card transactions using unsupervised Deep Learning. Although the outcomes here are unique, the underlying Deep Learning techniques and concepts remain the same. As long as you have followed the concepts in the previous chapters, you should understand these well. Maybe reading about these projects will trigger an innovative spark in your mind and you will come up with the next big AI solution. Here's hoping for that!

Neural Style Transfer

One of the big AI headlines of 2018 was a painting that sold for about $400,000 that was painted entirely by Artificial Intelligence. Many researchers are actively evaluating algorithms that learn patterns for creating art and using them to build new paintings. It's fascinating and a lot of fun. Let me show you an example of doing this. This example learns patterns from famous paintings and applies it to the photo we supply. To be specific, we will copy the style of a famous painting and draw it with our content, a photo. This is called neural style transfer. This topic has been very popular among computer vision researchers and many methods have evolved to solve this problem. There are a few websites and also a mobile app called Prizma that does this in real time on your photos. Let's see how this works.

We know that Deep Learning involves building deep neural networks that extract high‐level features from low‐level ones—particularly low‐level ones like pixel intensity arrays. As the model learns to identify patterns from image data, it learns many aspects of the picture, like the way pixels arrange themselves to form edges, curves, and surfaces. Now if we train the network on a digitized image of a painting, there is a good chance that the network will learn features like brushstrokes that the painter used to create the painting. This is the idea behind neural style transfer. In a nutshell, the process can be described as shown in Figure 6.1. This figure is from the wonderful paper describing this approach entitled “A Neural Algorithm of Artistic Style,” by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge.

Illustration depicting a general idea of how neural style transfer works, training the network on a digitized image of a painting. — **Figure 6.1**: General idea of how neural style transfer works

Figure 6.1 is from the original paper published by Gatys, Ecker, and Bethge. Here we see that there are two images. One is the content image, which is the photo of buildings. The other is the style image, which is the famous painting by Vincent van Gogh called The Starry Night. We see that the initial layers of the Convolutional Neural Network (CNN) has fewer filters and bigger pixel arrays. As we move down the network and we reduce the size of elements using pooling layers, we see an increase in filters. Hence the depth of the layer's volume increases. The layers down the line learn higher‐level feature‐sets from the images. At the same time, within a layer of filters, if we analyze the variation and try to correlate the filters, we get the style information of the image. Hence, down the line of the network, the style information that's captured also increases.

The method we will use is a style image, which will be a famous painting. The content image will be our image to process. We will define a style distance and content distance. These are both loss functions that we will try to optimize. The overall concept with an example is shown in Figure 6.2.

“Illustration of an example depicting neural style transfer using the content image of a person defining a style distance and content distance, and minimizing both distances.” — **Figure 6.2**: Example of neural style transfer

The general idea is to calculate the style distance and content distance between two images using certain feature layers of a deep network like CNN. We will use a popular VGG19 model trained on ImageNet data. VGG19 is a standard Deep Learning architecture that has 16 convolution layers and three fully connected layers. These are the weight layers and there are a few pooling layers in the middle. Let's look at the following code. I show and explain individual blocks of code and then put it all together to give you the full program.

The code in the following sections is highly inspired from Google's Keras/TensorFlow example for style transfer. You can look this code up in your TensorFlow installation or on GitHub at https://github.com/keras‐team/keras/blob/master/examples/neural_style_transfer.py.

There is also an excellent Medium post that covers this in detail by Raymond Yuan at https://medium.com/tensorflow/neural‐style‐transfer‐creating‐art‐with‐deep‐learning‐using‐tf‐keras‐and‐eager‐execution‐7d541ac31398.

Let's start with Listing 6.1. We will import a pretrained VGG19 model on the ImageNet data. We will also set an eager execution to “on” for TensorFlow so it does not create computation graphs but directly executes the code to get fast results.

Listing 6.1: Load VGG19 Model and Describe It

# import tensorflow libraries
import tensorflow as tf
# load the easy execution library
import tensorflow.contrib.eager as tfe
import time
 
# enable eager execution - this should be done at start of program
tf.enable_eager_execution()
print("Eager execution: {}".format(tf.executing_eagerly()))
 
# load the model from keras with imagenet weights
vgg19 = tf.keras.applications.vgg19.VGG19(include_top=False,
weights='imagenet')
vgg19.trainable = False
vgg19.summary()

Here are the results:

Eager execution: True
_________________________________________________________________
Layer (type)                 Output Shape              Param #  
=================================================================
input_3 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 20,024,384
Trainable params: 0
Non-trainable params: 20,024,384

Next, we select certain layers of features as our style and content layers. These layers will be used to extract features learned by the VGG19 model on the images. These features will give us an idea of both the content and style of the respective images. As stated earlier, our objective is to minimize the style and content distances—also referred to as costs—since we will be doing optimization. Let's select some of these layers by their names in the description. You can experiment with different layers. We will use the convolution layer in block_5 to compare the content and use multiple convolution layers for style comparison, as shown in Listing 6.2. Using these feature layers, we will build a new model called style_model that only returns these layers. We are no longer interested in predictions made by the model.

Listing 6.2: Build a New Model Outputting Layers for Style and Content Comparison

# Content layer where we will pull our feature maps
content_layers = ['block5_conv2']
 
# Style layer we are interested in
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1'
                ]
 
# get counters for style and content layers
num_content_layers = len(content_layers)
num_style_layers = len(style_layers)
 
# Get output layers corresponding to style and content layers
style_outputs = [vgg19.get_layer(name).output for name in style_layers]
content_outputs = [vgg19.get_layer(name).output for name in content_layers]
model_outputs = style_outputs + content_outputs
 
# Build model
style_model = tf.keras.models.Model(vgg19.input, model_outputs)

Next, we download two images—one for content and one for style (see Figure 6.3). We convert these into arrays and display them, as shown in Listing 6.3.

Illustration of a content image and style image used for a demonstration. — **Figure 6.3**: Style and content images we will use for this demo

Listing 6.3: Load Images for Style and Content

# download style and content image files
!wget -O mycontent.jpg https://pbs.twimg.com/profile_
images/872804244910358528/w5H_uzUD_400x400.jpg
 
!wget -O mystyle.jpg https://upload.wikimedia.org/wikipedia/
commons/thumb/e/ea/Van_Gogh_‐_Starry_Night_‐_Google_Art_Project.
jpg/1920px‐Van_Gogh_‐_Starry_Night_‐_Google_Art_Project.jpg
 
# import the plotting libraries
import matplotlib.pyplot as plt
%matplotlib inline
 
# import numpy
import numpy as np
# import preprocessing functions for preparing image
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input
 
content_path = 'mycontent.jpg'
style_path = 'mystyle.jpg'
 
# load content and style images in memory
content = image.load_img(content_path, target_size=(224, 224))
style = image.load_img(style_path, target_size=(224, 224))
 
# convert style and content images as arrays
content_x = image.img_to_array(content)
content_x = np.expand_dims(content_x, axis=0)
content_x = preprocess_input(content_x)
 
style_x = image.img_to_array(style)
style_x = np.expand_dims(style_x, axis=0)
style_x = preprocess_input(style_x)
 
# show the images loaded
plt.subplot(1, 2, 1)
plt.axis('off')
plt.title('Content image')
plt.imshow(content)
 
plt.subplot(1, 2, 2)
plt.axis('off')
plt.title('Style image')
plt.imshow(style)
plt.show()

Listing 6.4 shows some helper functions that will be used to calculate the loss for content and style and the gradients that we will use for optimization.

Listing 6.4: Helper Functions for Calculating Loss

# define a few helper functions
 
# get real pixel values from normalized result generated by model
def deprocess_img(processed_img):
    x = processed_img.copy()
    if len(x.shape) == 4:
    x = np.squeeze(x, 0)
    # perform the inverse of the preprocessing step
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    x = x[:, :, ::-1]
    # remove any values below 0 and above 255
    x = np.clip(x, 0, 255).astype('uint8')
    return x
 
# define the content loss as distance between content and target
def get_content_loss(base_content, target):
    return tf.reduce_mean(tf.square(base_content - target))
 
# to get style loss first we should calculate gram matrix
def gram_matrix(input_tensor):
    # make the image channels first
    channels = int(input_tensor.shape[-1])
    a = tf.reshape(input_tensor, [-1, channels])
    n = tf.shape(a)[0]
    # gram matrix is obtained by multiplying matrix with transpose
    gram = tf.matmul(a, a, transpose_a=True)
    return gram / tf.cast(n, tf.float32)
 
# calculate the style loss
def get_style_loss(base_style, gram_target):
    # we scale the loss at a given layer by the size of the feature map
and the number of filters
    height, width, channels = base_style.get_shape().as_list()
    gram_style = gram_matrix(base_style)
 
    return tf.reduce_mean(tf.square(gram_style - gram_target))
 
# calculate total loss
def compute_loss(model, loss_weights, init_image, gram_style_features, content_features):
    style_weight, content_weight = loss_weights
 
    # our model is callable just like any other function
    model_outputs = model(init_image)
    style_output_features = model_outputs[:num_style_layers]
    content_output_features = model_outputs[num_style_layers:]
 
    style_score = 0
    content_score = 0
 
    # accumulate style losses from all layers
    weight_per_style_layer = 1.0 / float(num_style_layers)
    for target_style, comb_style in zip(gram_style_features, style_
output_features):
    style_score += weight_per_style_layer * get_style_loss(comb_
style[0], target_style)
 
    # accumulate content losses from all layers
    weight_per_content_layer = 1.0 / float(num_content_layers)
    for target_content, comb_content in zip(content_features, content_
output_features):
        content_score += weight_per_content_layer* get_content_
loss(comb_content[0], target_content)
 
    style_score *= style_weight
    content_score *= content_weight
    # Get total loss
    loss = style_score + content_score
    return loss, style_score, content_score
 
# function to compute gradients
def compute_grads(cfg):
    with tf.GradientTape() as tape:
        all_loss = compute_loss(**cfg)
    # Compute gradients wrt input image
    total_loss = all_loss[0]
    return tape.gradient(total_loss, cfg['init_image']), all_loss
 
# compute our content and style feature representations
def get_feature_representations(model, content_path, style_path):
    # batch compute content and style features
    style_outputs = model(style_x)
    content_outputs = model(content_x)
    # get the style and content feature representations from our model
    style_features = [style_layer[0] for style_layer in style_
outputs[:num_style_layers]]
    content_features = [content_layer[0] for content_layer in content_
outputs[num_style_layers:]]
    return style_features, content_features
 
# display image function
def display_result(p_image):
    plt.figure(figsize=(8,8))
    plt.axis('off')
    plt.imshow(p_image)
    plt.show()

Next, we define the main function that we will call to do the style transfer optimization. We specify the number of iterations and provide weights for the content and style. See Listing 6.5.

Listing 6.5: Run the Main Style Transfer Function

# main function to actually run the style transfer
def run_style_transfer (num_iterations=1000, content_weight=1e3, style_
weight=1e-2):
    # we will set layers as not trainable since we are not learning
    model = style_model
    for layer in style_model.layers:
        layer.trainable = False
 
    # get the style and content feature representations (from our
specified intermediate layers)
    style_features, content_features = get_feature_representations
(style_model, content_path, style_path)
    gram_style_features = [gram_matrix(style_feature) for style_feature
in style_features]
 
    # set initial image as our content image
    init_image = content_x.copy()
    init_image = tfe.Variable(init_image, dtype=tf.float32)
    # lets build an Adam optimizer
    opt = tf.train.AdamOptimizer(learning_rate=2.0, beta1=0.99,
epsilon=1e-1)
 
    # for displaying intermediate images
    iter_count = 1
 
    # our best result
    best_loss, best_img = float('inf'), None
 
    # define loss terms and build a config object
    loss_weights = (style_weight, content_weight)
    cfg = {
            'model': style_model,
            'loss_weights': loss_weights,
            'init_image': init_image,
            'gram_style_features': gram_style_features,
            'content_features': content_features
        }
 
    # for displaying results
    num_rows = 2
    num_cols = 5
    display_interval = num_iterations/(num_rows*num_cols)
    start_time = time.time()
    global_start = time.time()
 
    # means of each channel for normalization
    norm_means = np.array([103.939, 116.779, 123.68])
    min_vals = -norm_means
    max_vals = 255 - norm_means
 
    # perform optimization and get the intermediate generated images
    # work with init_image and modify it through optimization
    imgs = []
    for i in range(num_iterations):
        grads, all_loss = compute_grads(cfg)
        loss, style_score, content_score = all_loss
        opt.apply_gradients([(grads, init_image)])
        clipped = tf.clip_by_value(init_image, min_vals, max_vals)
        init_image.assign(clipped)
        end_time = time.time()
 
        if loss < best_loss:
            # update best loss and best image from total loss.
            best_img = deprocess_img(init_image.numpy())
 
        if i % display_interval== 0:
           start_time = time.time()
           # define title for image
           print ('Iteration: {}'.format(i))
           print ('Total loss: {:.4e}, '
                  'style loss: {:.4e}, '
                  'content loss: {:.4e}, '
                  'time: {:.4f}s'.format(loss, style_score, content_
score, time.time() - start_time))
 
            # use the .numpy() method to get the concrete numpy array
            plot_img = init_image.numpy()
            plot_img = deprocess_img(plot_img)
            display_result(plot_img)
 
    print('Total time: {:.4f}s'.format(time.time() - global_start))
    return best_img, best_loss

Finally, we run the code to do the actual optimization and see how our original content photo gets transformed, as shown in Listing 6.6. We will pause every few iterations and see how the modified image looks. See Figure 6.4.

Illustration depicting the results of different iterations of a neural style transfer of an image presenting the total loss, style loss, and content loss. — **Figure 6.4**: Results of neural style transfer

There you have it—we took an image and applied the style of a famous painting to it. You can modify your images with different paintings to get some cool effects. Or you could download apps like Prizma and see this effect in action. Or why don't you code up a Prizma‐type app of your own?

This particular example and its code are available as a Google Colab Notebook at this link:

https://colab.research.google.com/drive/1_tHUYgO_fIBU1JXdn_mXWCDD6njLyNSu

Next, let's look at another interesting application of Deep Learning. You probably have heard of this one a lot in the news recently—using neural networks to create photos.

Generating Images Using AI

One of the big news items in 2018 related to AI was a new algorithm developed by researchers from NVIDIA that could generate fake celebrity photos. These photos were so realistic that any human could be fooled into thinking they were real. However, these were all fake photos generated by a super‐smart AI algorithm by identifying patterns in real photos. These are special types of algorithms called generative models that learn the probability distribution of input data and then generate new data.

We will use a popular generative model called generative adversarial networks (GAN) for generating new images. Before we talk about GAN, remember that a neural network—whether it is shallow or deep—learns to encode an image array into a limited dimensions vector. This vector can be seen as a compressed encoding of our original image. This is shown in Figure 6.5.

Illustration depicting how a neural network encodes an image array into a limited dimensions vector. — **Figure 6.5**: Neural network captures encoding of image

Now let's talk about GANs. The concept of how a GAN works is illustrated in Figure 6.6 with an art forger‐inspector analogy. We have two neural networks—one generator (G) and one discriminator (D). The generator creates images, starting with a random encoding vector. It does the reverse process of encoding shown in Figure 6.5. From an encoding vector, it generates an image. This is analogous to an art forger, who generates forgeries of paintings.

Illustration of an art-forger analogy for generative adversarial networks train, generating fakes identical to the real images. — **Figure 6.6**: Art‐forger analogy for generative adversarial networks

Next, we have a discriminator network that is analogous to an art inspector, who checks if the image is genuine or fake. This network takes one image at a time from real and generated sections and learns to classify it as real or fake. If the image generated by G is accepted by D as real, then G gets rewarded. If D finds a fake, then it gets the reward. These two networks are now competing against each other. Hence, it's called adversarial. Over time, as both networks train, G gets good at generating fakes identical to the real images. That's what we are looking for. This concept is illustrated in Figure 6.6.

Let's see this in action with a simple example using a very simple dataset. We will use the fashion items dataset that is provided with Keras. This is a set of grayscale images of fashion elements with each image at 28×28 pixels. These are pictures of 10 fashion objects, like coats, T‐shirts, shoes, etc. (see Figure 6.7). First, we load needed libraries, then we load the dataset and show some sample images to explore the dataset. See Listing 6.7.

Illustration displaying the pictures of the fashion items dataset, such as coats, shoes, T-shirts, ankle boots, sneakers, etc. — **Figure 6.7**: Displaying the fashion items dataset

Listing 6.7: Load the Images and Show Dataset Samples

# import tensorflow and math libraries
import tensorflow as tf
import numpy as np
 
# import plotting libraries
import matplotlib.pyplot as plt
%matplotlib inline
 
# print the version of tensorflow - higher than 1.0 preferred
print(tf.__version__)
 
# create a directory for our generated images
!mkdir images
 
# import the fashion dataset from keras
fashion_mnist = tf.keras.datasets.fashion_mnist
 
# extract the training and testing data
(X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data()
 
# set class names for images
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
 
# plot first 25 images to see how they look
plt.figure(figsize=(20,20))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[Y_train[i]], fontsize=25)

Next, we build the two neural networks—one for the generator (G) and other for the discriminator (D). G will take a random encoding vector as input and generate an image sized at 28×28 for us. D takes a 28×28 image and gives us a single result of true (for real image) or false (for generated image). You can see this in Listing 6.8.

Listing 6.8: Build the Generator and Discriminator Networks

# import keras libraries to create neural networks
from keras.layers import Input, ReLU
from keras.models import Model, Sequential
from keras.layers.core import Dense
from keras.optimizers import Adam
 
# set the encoding dimension - we will convert our image array to 128
dimension vector
ENCODING_SIZE = 128
 
# normalize the training data
X_train = X_train.astype(np.float32)/255.
 
# define the optimizer
adam = Adam(lr=0.0002, beta_1=0.5)
 
# now build the Generator that creates images
generator = Sequential()
generator.add(Dense(256, input_dim=ENCODING_SIZE, kernel_
initializer='random_uniform'))
generator.add(ReLU())
generator.add(Dense(512))
generator.add(ReLU())
generator.add(Dense(1024))
generator.add(ReLU())
generator.add(Dense(784, activation='tanh'))
generator.compile(loss='binary_crossentropy', optimizer=adam)
print('------ GENERATOR ------')
generator.summary()
 
# now build the Discriminator that classifies an image
discriminator = Sequential()
discriminator.add(Dense(1024, input_dim=784, kernel_initializer='random_
uniform'))
discriminator.add(ReLU())
discriminator.add(Dense(512))
discriminator.add(ReLU())
discriminator.add(Dense(256))
discriminator.add(ReLU())
discriminator.add(Dense(1, activation='sigmoid'))
discriminator.compile(loss='binary_crossentropy', optimizer=adam)
print('------ DISCRIMINATOR ------')
discriminator.summary()
 
# combine both networks in a single model
discriminator.trainable = False
ganInput = Input(shape=(ENCODING_SIZE,))
x = generator(ganInput)
ganOutput = discriminator(x)
gan_model = Model(inputs=ganInput, outputs=ganOutput)
gan_model.compile(loss='binary_crossentropy', optimizer=adam)

Here are the results:

------ GENERATOR ------
_________________________________________________________________
Layer (type)          Output Shape       Param # 
=================================================================
dense_1 (Dense)       (None, 256)        33024 
_________________________________________________________________
re_lu_1 (ReLU)        (None, 256)        0 
_________________________________________________________________
dense_2 (Dense)       (None, 512)        131584 
_________________________________________________________________
re_lu_2 (ReLU)        (None, 512)        0 
_________________________________________________________________
dense_3 (Dense)       (None, 1024)       525312 
_________________________________________________________________
re_lu_3 (ReLU)        (None, 1024)       0 
_________________________________________________________________
dense_4 (Dense)       (None, 784)        803600 
=================================================================
Total params: 1,493,520
Trainable params: 1,493,520
Non‐trainable params: 0
 
 
 
‐‐‐‐‐‐ DISCRIMINATOR ‐‐‐‐‐‐
_________________________________________________________________
Layer (type)          Output Shape       Param # 
=================================================================
dense_5 (Dense)       (None, 1024)       803840 
_________________________________________________________________
re_lu_4 (ReLU)        (None, 1024)       0 
_________________________________________________________________
dense_6 (Dense)       (None, 512)        524800 
_________________________________________________________________
re_lu_5 (ReLU)        (None, 512)        0 
_________________________________________________________________
dense_7 (Dense)       (None, 256)        131328 
_________________________________________________________________
re_lu_6 (ReLU)        (None, 256)        0 
_________________________________________________________________
dense_8 (Dense)       (None, 1)          257 
=================================================================
Total params: 1,460,225
Trainable params: 1,460,225
Non‐trainable params: 0

Now we will write two functions—one to plot the result images created by G during training and the other to perform the actual training by feeding real and fake images to the model. Then we run the training and, after every epoch, show a section of the images that were created. You can see this in Listing 6.9.

Listing 6.9: Do the Training of D and G on Images We Have

# plot the generated images on an array
def plotGeneratedImages(epoch, examples=100, dim=(10, 10), figsize=(10, 10)):
    # create random encoding vector to generate images
    noise = np.random.normal(0, 1, size=[examples, ENCODING_SIZE])
    generatedImages = generator.predict(noise)
    generatedImages = generatedImages.reshape(examples, 28, 28)
 
    # plot the array of images
    plt.figure(figsize=figsize)
    for i in range(generatedImages.shape[0]):
        plt.subplot(dim[0], dim[1], i+1)
        plt.imshow(generatedImages[i], cmap='gray_r')
        plt.axis('off')
    plt.tight_layout()
    plt.show()
 
# train the generative model
def train(epochs=1, batchSize=128):
    # get the number of samples in a batch
    batchCount = int(X_train.shape[0] / batchSize)
    print ('Epochs:', epochs)
    print ('Batch size:', batchSize)
    print ('Batches per epoch:', batchCount)
 
    # for each epoch
    for e in range(1, epochs+1):
        print ('-'*15, '
Epoch %d' % e)
        # for each
        for idx in np.arange(0,batchCount):
            if idx%10 == 0:
                print('-', end='')
 
            # get a random set of input noise and images
            noise = np.random.normal(0, 1, size=[batchSize, ENCODING_SIZE])
            imageBatch = X_train[np.random.randint(0, X_train.shape[0],
                                    size=batchSize)]
 
            # generate fake fashion images
            generatedImages = generator.predict(noise)
            imageBatch = np.reshape(imageBatch,(batchSize, 784))
            X = np.concatenate([imageBatch, generatedImages])
 
            # labels for generated and real data
            yDis = np.zeros(2*batchSize)
            # one-sided label smoothing
            yDis[:batchSize] = 0.9
 
            # train the discriminator
            discriminator.trainable = True
            dloss = discriminator.train_on_batch(X, yDis)
 
            # train the generator
            noise = np.random.normal(0, 1, size=[batchSize, ENCODING_SIZE])
            yGen = np.ones(batchSize)
            discriminator.trainable = False
                gloss = gan_model.train_on_batch(noise, yGen)
 
         plotGeneratedImages(e, examples=25, dim=(5,5))
 
 
# train for 20 iterations or epochs
train(20)

Here are the results:

Epochs: 200
Batch size: 128
Batches per epoch: 468

The images generated by this training process are shown in Figure 6.8 below. As we train by increasing epochs, the generated images get closer to the intended target. We start seeing patterns of fashion objects. We can keep training to improve the images and make them sharper.

Illustration depicting fashion images by a training process, trained by increasing epochs, and the generated images get closer to the intended target. — **Figure 6.8**: Results from GAN trained to generate fashion images

NVIDIA used celebrity photos to help the GAN model learn from known faces. After a few hours of training, the model was able to capture patterns that form faces. Then the model was able to output some generic faces that looked very much like known celebrities, but were fake people.

Credit Card Fraud Detection with Autoencoders

The previous two examples used unstructured data in the form of images. Now let's look at an example of structured tabular data. We will look at a dataset of financial transactions made using credit cards and try to identify patterns of fraudulent transactions. This particular use case is extremely common in the financial world. Perhaps you have received a call from your credit card bank stating that there was a suspicious transaction and they want to verify it was actually done by you. The transaction is usually flagged using some sort of ML model.

Traditionally, banks have used predefined rules for flagging suspicious transactions. For example, there could be a rule that if there is a sudden transaction from a different country, flag that for your approval. Or, if there is a purchase from a store that is not one you usually visit, flag that. Setting fixed rules to cover all sorts of cases for all individuals is extremely difficult and it's possible to get lots of false positives. Hence, modern systems rely on ML to find patterns of fraudulent transactions and predict if a transaction is fraudulent or normal.

We will explore an unsupervised learning method for analyzing this data, called autoencoder. First, let's look at the dataset. The dataset is structured and tabular. It includes a list of transactions with time, amount, and several details like customer account, vendor account, government taxes, etc. For this example, we will use a dataset that is generously made available in the public domain by the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles). This dataset was generated as a research study by Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi.

The dataset is available as a CSV file called creditcard.csv. The dataset contains transactions made by credit cards in September 2013 by European cardholders. It presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, because the positive class (frauds) accounts for 0.172% of all transactions. Three features or columns are provided—Amount, Time, and Class. The Time feature contains the seconds elapsed between each transaction and the first transaction in the dataset. The Amount feature is the transaction amount and the Class feature is the response variable. It indicates 1 for fraud and 0 for a normal transaction.

The dataset has 28 columns named V1, V2, V3 … to V28. These represent the customer and vendor details for each transaction. However, using a dimensionality reduction technique called Principal Component Analysis (PCA), we have been given just these 28 V‐features. This is also to hide the customer and vendor details in the interest of privacy. We can assume these 28 features are of importance and start analyzing the data. Figure 6.9 shows the data loaded in Excel.

Screenshot of a credit card transaction dataset with details hidden in V-features, where the dataset has 28 columns named V1, V2, V3 … to V28. — **Figure 6.9**: Credit card transaction dataset with details hidden in V‐features

We will use a special type of neural network to solve this problem, called an autoencoder. This is an unsupervised learning network that basically tries to reproduce the inputs given to it. The idea is to read the input vector and encode it using an encoder neural network into a smaller dimension vector called encoding vector. Then this smaller dimension encoding vector is decoded back into the input vector. The input is compressed and stored as a small encoding vector. This method has also been applied to data compression.

There is some information loss when you encode a larger dimension input vector into the smaller encoding. The idea is for the model to learn to encode so well that all the important patterns in data will be captured in the encoding. This concept is explained in Figure 6.10.

Illustration of the concept of an autoencoder neural network with information loss when a larger dimension input vector is encoded into the smaller encoding. — **Figure 6.10**: Concept of an autoencoder neural network

Let's look at the code to build the autoencoder and then use it to detect anomalies in credit card transaction data. First, we will load the CSV file and prepare the training dataset. The key thing for the autoencoder is that the input (X) and output (Y) data is the same. Hence, it will learn in an unsupervised manner and try to re‐create the input fed into it. Let's prepare the training data in Listing 6.10.

Listing 6.10: Load the Credit Card Data and Prepare the Dataset

# import the required libraries including plotting
import pandas as pd
import numpy as np
 
import matplotlib.pyplot as plt
%matplotlib inline
 
# load the csv file with values
df = pd.read_csv('creditcard.csv')
df.head()

Here are the results:

TIME	V1	V2	V3	V4	V5	V9	…	V25	V26	V27	V28	AMOUNT	CLASS
0	0.0	−1.359807	−0.072781	2.536347	1.378155	0.098698	0.363787	…	0.128539	−0.189115	0.133558	−0.021053	149.62	0
1	0.0	1.191857	0.266151	0.166480	0.448154	0.085102	−0.255425	…	0.167170	0.125895	−0.008983	0.014724	2.69	0
2	1.0	−1.358354	−1.340163	1.773209	0.379780	0.247676	−1.514654	…	−0.327642	−0.139097	−0.055353	−0.059752	378.66	0
3	1.0	−0.966272	−0.185226	1.792993	−0.863291	0.377436	−1.387024	…	0.647376	−0.221929	0.062723	0.061458	123.50	0
4	2.0	−1.158233	0.877737	1.548718	0.403034	−0.270533	0.817739	…	−0.206010	0.502292	0.219422	0.215153	69.99	0

First, we will only concern ourselves with high‐value transactions—say any amount above $200. We will use Scikit‐Learn's built‐in methods to scale the values in data frames. Then we will create a testing array with only normal transactions. Keep in mind that we only need x_train and x_val arrays since we are using unsupervised learning. Our expected Y values will be the X values themselves. You can see this code in Listing 6.11.

Listing 6.11: Prepare Training and Validations Data Arrays

# we will only look at transactions above value of 200
cc_data_subset = df[df.Amount > 200]
 
# we will scale the 'Time' and 'Amount' features to standard scale
# the V-features are already scaled
from sklearn.preprocessing import StandardScaler
 
cc_data_subset['Time'] = StandardScaler().fit_transform(cc_data_
subset['Time'].values.reshape(-1, 1))
cc_data_subset['Amount'] = StandardScaler().fit_transform(cc_data_
subset['Amount'].values.reshape(-1, 1))
 
# now we will separate normal and fraudulent transactions
cc_data_normal = cc_data_subset[cc_data_subset.Class == 0]
cc_data_fraud = cc_data_subset[cc_data_subset.Class == 1]
 
# list how many of each type of transactions exist
print("Normal transactions array shape = ", cc_data_normal.shape)
print("Fraud transactions array shape = ", cc_data_fraud.shape)
 
# get the number of fraudulent transactions present
num_of_fraud = cc_data_fraud.shape[0]
print("Number of fraud transactions = ", num_of_fraud)
 
# we will create a testing data frame of normal and fraudulent transactions
df_testing = cc_data_normal[-num_of_fraud:]
df_testing = df_testing.append(cc_data_fraud)
 
# for training data frame we will use only normal transactions
df_training = cc_data_normal[:-num_of_fraud]
 
# we will split the training data into training and validation
# we will use Scikit-Learn's built0in function to do the split
from sklearn.model_selection import train_test_split
 
# we dont need result column in training frame
df_training = df_training.drop(['Class'], axis=1) #drop the Class column
 
# we will first store the testing labels in a frame and then drop the
Class column
df_testing_labels = df_testing['Class']
df_testing = df_testing.drop(['Class'], axis=1) #drop the Class column
 
# now we will create arrays for training the autoencoder network
x_training = df_training.values
x_train, x_val = train_test_split(x_training, test_size=0.1)
 
# print the shapes of arrays
print("X Training array shape = ", x_train.shape)
print("X Validation array shape = ", x_val.shape)

Here are the results:

Normal transactions array shape = (28752, 31)
Fraud transactions array shape = (85, 31)
 
Number of fraud transactions = 85
 
X Training array shape = (25800, 30)
X Validation array shape = (2867, 30)

Now we will build the autoencoder model. As we saw, this model will have an encoder and decoder part. The encoder takes a high‐dimensional vector and generates a low‐dimensional encoding. We have an input vector of size 30 and we will use an encoding size of 15. You can change this and see if you get better results. You can see this code in Listing 6.12.

Listing 6.12: Build the Autoencoder Neural Network in Keras

from keras.layers import Input, Dense, Dropout
from keras.models import Model
from keras import regularizers
 
# dimensions of the input vector - we have 30 variables
input_dim = 30
# this is the size of our encoded representations
encoding_dim = 15
 
# autoencoder - neural network
# this is the input layer
input_layer = Input(shape=(input_dim,))
 
# encoded representation of the input
encoded_layer = Dense(encoding_dim, activation='relu')(input_layer)
 
# lossy reconstruction of the input
decoded_layer = Dense(input_dim, activation='relu')(encoded_layer)
 
# combine encoder and decoder as a single model
autoencoder = Model(input_layer, decoded_layer)
 
# lets compile the model using mse loss
autoencoder.compile(metrics=['accuracy'],
                    loss='mean_squared_error',
                    optimizer='adam')
 
# show summary of the model
autoencoder.summary()

Here are the results:

Layer (type)                 Output Shape       Param # 
=================================================================
input_51 (InputLayer)        (None, 30)         0 
_________________________________________________________________
dense_119 (Dense)            (None, 15)         465 
_________________________________________________________________
dense_120 (Dense)            (None, 30)         480 
=================================================================
Total params: 945
Trainable params: 945
Non‐trainable params: 0

Now let's train the model on our x_train and x_val arrays. Notice that we don't have y_train and y_val arrays. We use the input as the expected output. You can see this code in Listing 6.13.

Listing 6.13: Train the Autoencoder Using Input Array Only

# lets train the autoencoder  for 25 epochs
history = autoencoder.fit(x_train, x_train,
                epochs=25,
                batch_size=32,
                validation_data=(x_val, x_val),
                shuffle=True)

Here are the results:

Train on 25800 samples, validate on 2867 samples
Epoch 1/25
25800/25800 [==============================] ‐ 3s 131us/step ‐ loss:
1.7821 ‐ acc: 0.3620 ‐ val_loss: 1.8113 ‐ val_acc: 0.5225
Epoch 2/25
25800/25800 [==============================] ‐ 1s 46us/step ‐ loss:
1.5699 ‐ acc: 0.5834 ‐ val_loss: 1.7444 ‐ val_acc: 0.6264
Epoch 3/25
25800/25800 [==============================] ‐ 1s 48us/step ‐ loss:
1.5282 ‐ acc: 0.6578 ‐ val_loss: 1.7110 ‐ val_acc: 0.6983
Epoch 4/25
25800/25800 [==============================] ‐ 1s 47us/step ‐ loss:
1.5010 ‐ acc: 0.7069 ‐ val_loss: 1.6911 ‐ val_acc: 0.7203
Epoch 5/25
25800/25800 [==============================] ‐ 1s 48us/step ‐ loss:
1.4760 ‐ acc: 0.7460 ‐ val_loss: 1.6697 ‐ val_acc: 0.7719
Epoch 6/25
25800/25800 [==============================] ‐ 1s 47us/step ‐ loss:
1.4617 ‐ acc: 0.7763 ‐ val_loss: 1.6483 ‐ val_acc: 0.7733
Epoch 7/25
25800/25800 [==============================] ‐ 1s 47us/step ‐ loss:
1.4521 ‐ acc: 0.7834 ‐ val_loss: 1.6391 ‐ val_acc: 0.7939
Epoch 8/25
25800/25800 [==============================] ‐ 1s 48us/step ‐ loss:
1.4463 ‐ acc: 0.7956 ‐ val_loss: 1.6355 ‐ val_acc: 0.8036
Epoch 9/25
25800/25800 [==============================] ‐ 1s 57us/step ‐ loss:
1.4430 ‐ acc: 0.8025 ‐ val_loss: 1.6298 ‐ val_acc: 0.8033
Epoch 10/25
25800/25800 [==============================] ‐ 1s 55us/step ‐ loss:
1.4407 ‐ acc: 0.8062 ‐ val_loss: 1.6350 ‐ val_acc: 0.8022
Epoch 11/25
25800/25800 [==============================] ‐ 1s 49us/step ‐ loss:
1.4398 ‐ acc: 0.8091 ‐ val_loss: 1.6290 ‐ val_acc: 0.8099
Epoch 12/25
25800/25800 [==============================] ‐ 1s 49us/step ‐ loss:
1.4384 ‐ acc: 0.8114 ‐ val_loss: 1.6273 ‐ val_acc: 0.8036
Epoch 13/25
25800/25800 [==============================] ‐ 1s 48us/step ‐ loss:
1.4379 ‐ acc: 0.8126 ‐ val_loss: 1.6258 ‐ val_acc: 0.8183
Epoch 14/25
25800/25800 [==============================] ‐ 1s 51us/step ‐ loss:
1.4374 ‐ acc: 0.8140 ‐ val_loss: 1.6267 ‐ val_acc: 0.8204
Epoch 15/25
25800/25800 [==============================] ‐ 1s 49us/step ‐ loss:
1.4368 ‐ acc: 0.8144 ‐ val_loss: 1.6257 ‐ val_acc: 0.8186
Epoch 16/25
25800/25800 [==============================] ‐ 2s 59us/step ‐ loss:
1.4363 ‐ acc: 0.8164 ‐ val_loss: 1.6260 ‐ val_acc: 0.8141
Epoch 17/25
25800/25800 [==============================] ‐ 1s 53us/step ‐ loss:
1.4358 ‐ acc: 0.8174 ‐ val_loss: 1.6253 ‐ val_acc: 0.8190
Epoch 18/25
25800/25800 [==============================] ‐ 1s 53us/step ‐ loss:
1.4356 ‐ acc: 0.8160 ‐ val_loss: 1.6243 ‐ val_acc: 0.8183
Epoch 19/25
25800/25800 [==============================] ‐ 1s 50us/step ‐ loss:
1.4353 ‐ acc: 0.8169 ‐ val_loss: 1.6257 ‐ val_acc: 0.8137
Epoch 20/25
25800/25800 [==============================] ‐ 1s 54us/step ‐ loss:
1.4351 ‐ acc: 0.8186 ‐ val_loss: 1.6245 ‐ val_acc: 0.8134: 0s ‐ loss:
1.4152 ‐ a
Epoch 21/25
25800/25800 [==============================] ‐ 1s 56us/step ‐ loss:
1.4347 ‐ acc: 0.8198 ‐ val_loss: 1.6237 ‐ val_acc: 0.8116
Epoch 22/25
25800/25800 [==============================] ‐ 1s 52us/step ‐ loss:
1.4346 ‐ acc: 0.8181 ‐ val_loss: 1.6255 ‐ val_acc: 0.8193s ‐ loss:
1.3752 ‐ ‐ ETA: 0s ‐ loss: 1.4163 ‐ acc: 0.
Epoch 23/25
25800/25800 [==============================] ‐ 1s 51us/step ‐ loss:
1.4343 ‐ acc: 0.8194 ‐ val_loss: 1.6232 ‐ val_acc: 0.8148
Epoch 24/25
25800/25800 [==============================] ‐ 1s 54us/step ‐ loss:
1.4342 ‐ acc: 0.8189 ‐ val_loss: 1.6230 ‐ val_acc: 0.8155
Epoch 25/25
25800/25800 [==============================] ‐ 1s 56us/step ‐ loss:
1.4340 ‐ acc: 0.8216 ‐ val_loss: 1.6265 ‐ val_acc: 0.8123

We will plot the accuracy and loss values for training and validation datasets. You can see this code in Listing 6.14.

Listing 6.14: Plot the Accuracy and Loss Values

# summarize history for accuracy
plt.figure(figsize=(20,10))
plt.rcParams.update({'font.size': 22})
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
 
# summarize history for loss
plt.figure(figsize=(20,10))
plt.rcParams.update({'font.size': 22})
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

The results are two charts, as shown in Figures 6.11 and 6.12.

Chart depicting the model accuracy plot for autoencoder, plotting accuracy values for training and validation datasets. — **Figure 6.11**: Model accuracy plot for autoencoder

Chart depicting the model loss plot for autoencoder, plotting loss values for training and validation datasets. — **Figure 6.12**: Model loss plot for autoencoder

Now we will make a prediction with the trained autoencoder on the testing dataset. We will compare the input values with predictions and calculate the reconstruction error for each data point. Since we trained on normal transactions, these should have a low reconstruction error. Fraudulent transactions will have different data distributions and should give us a higher reconstruction error. You can see this code in Listing 6.15.

Listing 6.15: Using Autoencoder to Make Predictions and Find Fraud Transactions

# set the testing array - this has 85 normal and 85 fraud transactions
x_testing = df_testing.values
 
# get the prediction using autoencoder network
x_predictions = autoencoder.predict(x_testing)
 
# calculate the reconstruction error as mean square error
reconstruction_error = np.mean(np.power(x_testing - x_predictions, 2), axis=1)
 
# create new data frame with error and true class (normal/fraud)
# ideally fraud classes should have high reconstruction error
error_df = pd.DataFrame({'Reconstruction_Error': reconstruction_error,
               'True_Class': df_testing_labels.values})
 
# set a threshold for error
threshold_fixed = 2
 
# separate data in groups for plotting
groups = error_df.groupby('True_Class')
 
# plot the chart
fig, ax = plt.subplots(figsize=(20,10))
plt.rcParams.update({'font.size': 22})
for name, group in groups:
    ax.plot(group.index, group.Reconstruction_Error, marker='o', ms=8,
linestyle='',
       label= "Fraud" if name == 1 else "Normal")
 
ax.hlines(threshold_fixed, ax.get_xlim()[0], ax.get_xlim()[1],
colors="g", zorder=100, label='Threshold')
ax.legend()
 
plt.title("Reconstruction error for normal and fraud")
plt.ylabel("Reconstruction error")
plt.xlabel("Testing dataset")
plt.show()

The result is shown in Figure 6.13.

Chart presenting the normal, fraud, threshold reconstruction error predictions on testing data using autoencoder. — **Figure 6.13**: Predictions on testing data using autoencoder

This chart in Figure 6.13 tells us a good story. We see the reconstruction error as high for the fraudulent transactions, shown in orange. For the normal transactions in blue, we see most points below our defined threshold. Now we don't catch all the fraudulent transactions, but more than 75% of them, which is very good. You can explore modifying hyper‐parameters like number of layers and neurons to see if you get better results. Hopefully, this code shows you the power of Deep Learning to find patterns in data and detect anomalies. Since this is unsupervised, we did not give labeled outputs. You can use this approach in pretty much any domain of data.

Summary

In this chapter, we looked at some unique applications of the Deep Learning technology. We saw how we can use the neural style transfer method to transfer the style of a painting to our own images. Then we saw generative networks and created new data points that highly resemble real data. Finally, we saw the use of a special type of network called an autoencoder that learns to find anomalies in data using unsupervised learning. These methods are pretty new and proposed by researchers in several publications. The Deep Learning community is truly awesome and shares valuable content with everyone. You can explore new papers as they are published on the Cornell University site (arxiv.org) to learn about new solutions as they are developed. Also, I highly encourage you to contribute your own papers here so everyone can benefit from your knowledge!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 6: Cutting‐Edge Deep Learning Projects

Create new playlist

Sign In

Sign Up

Neural Style Transfer

Generating Images Using AI

Credit Card Fraud Detection with Autoencoders

Summary

Table of Contents for
CHAPTER 6: Cutting‐Edge Deep Learning Projects