CHAPTER 6
Cutting‐Edge Deep Learning Projects

Deep Learning is revolutionizing our world with some amazing results by processing images, text, speech, and video. It is extracting knowledge and insights from unstructured data. We saw examples of processing images and text data using Deep Learning in Chapter 5. In this chapter, we take that understanding to the next level and look at some interesting projects. These are innovative solutions folks have developed and shared with the community. These solutions have become very popular due to their unique nature, and you may have read a few news articles on these promoting AI. We will see cool projects like repainting photos in styles of famous painters and generating fake images that look indistinguishable from the real ones. We will see an example of detecting fraud in credit card transactions using unsupervised Deep Learning. Although the outcomes here are unique, the underlying Deep Learning techniques and concepts remain the same. As long as you have followed the concepts in the previous chapters, you should understand these well. Maybe reading about these projects will trigger an innovative spark in your mind and you will come up with the next big AI solution. Here's hoping for that!

Neural Style Transfer

One of the big AI headlines of 2018 was a painting that sold for about $400,000 that was painted entirely by Artificial Intelligence. Many researchers are actively evaluating algorithms that learn patterns for creating art and using them to build new paintings. It's fascinating and a lot of fun. Let me show you an example of doing this. This example learns patterns from famous paintings and applies it to the photo we supply. To be specific, we will copy the style of a famous painting and draw it with our content, a photo. This is called neural style transfer. This topic has been very popular among computer vision researchers and many methods have evolved to solve this problem. There are a few websites and also a mobile app called Prizma that does this in real time on your photos. Let's see how this works.

We know that Deep Learning involves building deep neural networks that extract high‐level features from low‐level ones—particularly low‐level ones like pixel intensity arrays. As the model learns to identify patterns from image data, it learns many aspects of the picture, like the way pixels arrange themselves to form edges, curves, and surfaces. Now if we train the network on a digitized image of a painting, there is a good chance that the network will learn features like brushstrokes that the painter used to create the painting. This is the idea behind neural style transfer. In a nutshell, the process can be described as shown in Figure 6.1. This figure is from the wonderful paper describing this approach entitled “A Neural Algorithm of Artistic Style,” by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge.

Illustration depicting a general idea of how neural style transfer works, training the network on a digitized image of a painting.

Figure 6.1: General idea of how neural style transfer works

Figure 6.1 is from the original paper published by Gatys, Ecker, and Bethge. Here we see that there are two images. One is the content image, which is the photo of buildings. The other is the style image, which is the famous painting by Vincent van Gogh called The Starry Night. We see that the initial layers of the Convolutional Neural Network (CNN) has fewer filters and bigger pixel arrays. As we move down the network and we reduce the size of elements using pooling layers, we see an increase in filters. Hence the depth of the layer's volume increases. The layers down the line learn higher‐level feature‐sets from the images. At the same time, within a layer of filters, if we analyze the variation and try to correlate the filters, we get the style information of the image. Hence, down the line of the network, the style information that's captured also increases.

The method we will use is a style image, which will be a famous painting. The content image will be our image to process. We will define a style distance and content distance. These are both loss functions that we will try to optimize. The overall concept with an example is shown in Figure 6.2.

“Illustration of an example depicting neural style transfer using the content image of a person defining a style distance and content distance, and minimizing both distances.”

Figure 6.2: Example of neural style transfer

The general idea is to calculate the style distance and content distance between two images using certain feature layers of a deep network like CNN. We will use a popular VGG19 model trained on ImageNet data. VGG19 is a standard Deep Learning architecture that has 16 convolution layers and three fully connected layers. These are the weight layers and there are a few pooling layers in the middle. Let's look at the following code. I show and explain individual blocks of code and then put it all together to give you the full program.

The code in the following sections is highly inspired from Google's Keras/TensorFlow example for style transfer. You can look this code up in your TensorFlow installation or on GitHub at https://github.com/keras‐team/keras/blob/master/examples/neural_style_transfer.py.

There is also an excellent Medium post that covers this in detail by Raymond Yuan at https://medium.com/tensorflow/neural‐style‐transfer‐creating‐art‐with‐deep‐learning‐using‐tf‐keras‐and‐eager‐execution‐7d541ac31398.

Let's start with Listing 6.1. We will import a pretrained VGG19 model on the ImageNet data. We will also set an eager execution to “on” for TensorFlow so it does not create computation graphs but directly executes the code to get fast results.

Here are the results:

Eager execution: True
_________________________________________________________________
Layer (type)                 Output Shape              Param #  
=================================================================
input_3 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 20,024,384
Trainable params: 0
Non-trainable params: 20,024,384 

Next, we select certain layers of features as our style and content layers. These layers will be used to extract features learned by the VGG19 model on the images. These features will give us an idea of both the content and style of the respective images. As stated earlier, our objective is to minimize the style and content distances—also referred to as costs—since we will be doing optimization. Let's select some of these layers by their names in the description. You can experiment with different layers. We will use the convolution layer in block_5 to compare the content and use multiple convolution layers for style comparison, as shown in Listing 6.2. Using these feature layers, we will build a new model called style_model that only returns these layers. We are no longer interested in predictions made by the model.

Next, we download two images—one for content and one for style (see Figure 6.3). We convert these into arrays and display them, as shown in Listing 6.3.

Illustration of a content image and style image used for a demonstration.

Figure 6.3: Style and content images we will use for this demo

Listing 6.4 shows some helper functions that will be used to calculate the loss for content and style and the gradients that we will use for optimization.

Next, we define the main function that we will call to do the style transfer optimization. We specify the number of iterations and provide weights for the content and style. See Listing 6.5.

Finally, we run the code to do the actual optimization and see how our original content photo gets transformed, as shown in Listing 6.6. We will pause every few iterations and see how the modified image looks. See Figure 6.4.

Illustration depicting the results of different iterations of a neural style transfer of an image presenting the total loss, style loss, and content loss.

Figure 6.4: Results of neural style transfer

There you have it—we took an image and applied the style of a famous painting to it. You can modify your images with different paintings to get some cool effects. Or you could download apps like Prizma and see this effect in action. Or why don't you code up a Prizma‐type app of your own?

This particular example and its code are available as a Google Colab Notebook at this link:

https://colab.research.google.com/drive/1_tHUYgO_fIBU1JXdn_mXWCDD6njLyNSu

Next, let's look at another interesting application of Deep Learning. You probably have heard of this one a lot in the news recently—using neural networks to create photos.

Generating Images Using AI

One of the big news items in 2018 related to AI was a new algorithm developed by researchers from NVIDIA that could generate fake celebrity photos. These photos were so realistic that any human could be fooled into thinking they were real. However, these were all fake photos generated by a super‐smart AI algorithm by identifying patterns in real photos. These are special types of algorithms called generative models that learn the probability distribution of input data and then generate new data.

We will use a popular generative model called generative adversarial networks (GAN) for generating new images. Before we talk about GAN, remember that a neural network—whether it is shallow or deep—learns to encode an image array into a limited dimensions vector. This vector can be seen as a compressed encoding of our original image. This is shown in Figure 6.5.

Illustration depicting how a neural network encodes an image array into a limited dimensions vector.

Figure 6.5: Neural network captures encoding of image

Now let's talk about GANs. The concept of how a GAN works is illustrated in Figure 6.6 with an art forger‐inspector analogy. We have two neural networks—one generator (G) and one discriminator (D). The generator creates images, starting with a random encoding vector. It does the reverse process of encoding shown in Figure 6.5. From an encoding vector, it generates an image. This is analogous to an art forger, who generates forgeries of paintings.

Illustration of an art-forger analogy for generative adversarial networks train, generating fakes identical to the real images.

Figure 6.6: Art‐forger analogy for generative adversarial networks

Next, we have a discriminator network that is analogous to an art inspector, who checks if the image is genuine or fake. This network takes one image at a time from real and generated sections and learns to classify it as real or fake. If the image generated by G is accepted by D as real, then G gets rewarded. If D finds a fake, then it gets the reward. These two networks are now competing against each other. Hence, it's called adversarial. Over time, as both networks train, G gets good at generating fakes identical to the real images. That's what we are looking for. This concept is illustrated in Figure 6.6.

Let's see this in action with a simple example using a very simple dataset. We will use the fashion items dataset that is provided with Keras. This is a set of grayscale images of fashion elements with each image at 28×28 pixels. These are pictures of 10 fashion objects, like coats, T‐shirts, shoes, etc. (see Figure 6.7). First, we load needed libraries, then we load the dataset and show some sample images to explore the dataset. See Listing 6.7.

Illustration displaying the pictures of the fashion items dataset, such as coats, shoes, T-shirts, ankle boots, sneakers, etc.

Figure 6.7: Displaying the fashion items dataset

Next, we build the two neural networks—one for the generator (G) and other for the discriminator (D). G will take a random encoding vector as input and generate an image sized at 28×28 for us. D takes a 28×28 image and gives us a single result of true (for real image) or false (for generated image). You can see this in Listing 6.8.

Here are the results:

------ GENERATOR ------
_________________________________________________________________
Layer (type)          Output Shape       Param # 
=================================================================
dense_1 (Dense)       (None, 256)        33024 
_________________________________________________________________
re_lu_1 (ReLU)        (None, 256)        0 
_________________________________________________________________
dense_2 (Dense)       (None, 512)        131584 
_________________________________________________________________
re_lu_2 (ReLU)        (None, 512)        0 
_________________________________________________________________
dense_3 (Dense)       (None, 1024)       525312 
_________________________________________________________________
re_lu_3 (ReLU)        (None, 1024)       0 
_________________________________________________________________
dense_4 (Dense)       (None, 784)        803600 
=================================================================
Total params: 1,493,520
Trainable params: 1,493,520
Non‐trainable params: 0
 
 
 
‐‐‐‐‐‐ DISCRIMINATOR ‐‐‐‐‐‐
_________________________________________________________________
Layer (type)          Output Shape       Param # 
=================================================================
dense_5 (Dense)       (None, 1024)       803840 
_________________________________________________________________
re_lu_4 (ReLU)        (None, 1024)       0 
_________________________________________________________________
dense_6 (Dense)       (None, 512)        524800 
_________________________________________________________________
re_lu_5 (ReLU)        (None, 512)        0 
_________________________________________________________________
dense_7 (Dense)       (None, 256)        131328 
_________________________________________________________________
re_lu_6 (ReLU)        (None, 256)        0 
_________________________________________________________________
dense_8 (Dense)       (None, 1)          257 
=================================================================
Total params: 1,460,225
Trainable params: 1,460,225
Non‐trainable params: 0 

Now we will write two functions—one to plot the result images created by G during training and the other to perform the actual training by feeding real and fake images to the model. Then we run the training and, after every epoch, show a section of the images that were created. You can see this in Listing 6.9.

Here are the results:

Epochs: 200
Batch size: 128
Batches per epoch: 468 

The images generated by this training process are shown in Figure 6.8 below. As we train by increasing epochs, the generated images get closer to the intended target. We start seeing patterns of fashion objects. We can keep training to improve the images and make them sharper.

Illustration depicting fashion images by a training process, trained by increasing epochs, and the generated images get closer to the intended target.

Figure 6.8: Results from GAN trained to generate fashion images

NVIDIA used celebrity photos to help the GAN model learn from known faces. After a few hours of training, the model was able to capture patterns that form faces. Then the model was able to output some generic faces that looked very much like known celebrities, but were fake people.

Credit Card Fraud Detection with Autoencoders

The previous two examples used unstructured data in the form of images. Now let's look at an example of structured tabular data. We will look at a dataset of financial transactions made using credit cards and try to identify patterns of fraudulent transactions. This particular use case is extremely common in the financial world. Perhaps you have received a call from your credit card bank stating that there was a suspicious transaction and they want to verify it was actually done by you. The transaction is usually flagged using some sort of ML model.

Traditionally, banks have used predefined rules for flagging suspicious transactions. For example, there could be a rule that if there is a sudden transaction from a different country, flag that for your approval. Or, if there is a purchase from a store that is not one you usually visit, flag that. Setting fixed rules to cover all sorts of cases for all individuals is extremely difficult and it's possible to get lots of false positives. Hence, modern systems rely on ML to find patterns of fraudulent transactions and predict if a transaction is fraudulent or normal.

We will explore an unsupervised learning method for analyzing this data, called autoencoder. First, let's look at the dataset. The dataset is structured and tabular. It includes a list of transactions with time, amount, and several details like customer account, vendor account, government taxes, etc. For this example, we will use a dataset that is generously made available in the public domain by the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles). This dataset was generated as a research study by Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi.

The dataset is available as a CSV file called creditcard.csv. The dataset contains transactions made by credit cards in September 2013 by European cardholders. It presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, because the positive class (frauds) accounts for 0.172% of all transactions. Three features or columns are provided—Amount, Time, and Class. The Time feature contains the seconds elapsed between each transaction and the first transaction in the dataset. The Amount feature is the transaction amount and the Class feature is the response variable. It indicates 1 for fraud and 0 for a normal transaction.

The dataset has 28 columns named V1, V2, V3 … to V28. These represent the customer and vendor details for each transaction. However, using a dimensionality reduction technique called Principal Component Analysis (PCA), we have been given just these 28 V‐features. This is also to hide the customer and vendor details in the interest of privacy. We can assume these 28 features are of importance and start analyzing the data. Figure 6.9 shows the data loaded in Excel.

Screenshot of a credit card transaction dataset with details hidden in V-features, where the dataset has 28 columns named V1, V2, V3 … to V28.

Figure 6.9: Credit card transaction dataset with details hidden in V‐features

We will use a special type of neural network to solve this problem, called an autoencoder. This is an unsupervised learning network that basically tries to reproduce the inputs given to it. The idea is to read the input vector and encode it using an encoder neural network into a smaller dimension vector called encoding vector. Then this smaller dimension encoding vector is decoded back into the input vector. The input is compressed and stored as a small encoding vector. This method has also been applied to data compression.

There is some information loss when you encode a larger dimension input vector into the smaller encoding. The idea is for the model to learn to encode so well that all the important patterns in data will be captured in the encoding. This concept is explained in Figure 6.10.

Illustration of the concept of an autoencoder neural network with information loss when a larger dimension input vector is encoded into the smaller encoding.

Figure 6.10: Concept of an autoencoder neural network

Let's look at the code to build the autoencoder and then use it to detect anomalies in credit card transaction data. First, we will load the CSV file and prepare the training dataset. The key thing for the autoencoder is that the input (X) and output (Y) data is the same. Hence, it will learn in an unsupervised manner and try to re‐create the input fed into it. Let's prepare the training data in Listing 6.10.

Here are the results:

TIME V1 V2 V3 V4 V5 V9 V25 V26 V27 V28 AMOUNT CLASS
0 0.0 −1.359807 −0.072781  2.536347  1.378155  0.098698  0.363787  0.128539 −0.189115  0.133558 −0.021053 149.62 0
1 0.0  1.191857  0.266151  0.166480  0.448154  0.085102 −0.255425  0.167170  0.125895 −0.008983  0.014724 2.69 0
2 1.0 −1.358354 −1.340163  1.773209  0.379780  0.247676 −1.514654 −0.327642 −0.139097 −0.055353 −0.059752 378.66 0
3 1.0 −0.966272 −0.185226  1.792993 −0.863291  0.377436 −1.387024  0.647376 −0.221929  0.062723  0.061458 123.50 0
4 2.0 −1.158233  0.877737  1.548718  0.403034 −0.270533  0.817739 −0.206010  0.502292  0.219422  0.215153 69.99 0

First, we will only concern ourselves with high‐value transactions—say any amount above $200. We will use Scikit‐Learn's built‐in methods to scale the values in data frames. Then we will create a testing array with only normal transactions. Keep in mind that we only need x_train and x_val arrays since we are using unsupervised learning. Our expected Y values will be the X values themselves. You can see this code in Listing 6.11.

Here are the results:

Normal transactions array shape = (28752, 31)
Fraud transactions array shape = (85, 31)
 
Number of fraud transactions = 85
 
X Training array shape = (25800, 30)
X Validation array shape = (2867, 30) 

Now we will build the autoencoder model. As we saw, this model will have an encoder and decoder part. The encoder takes a high‐dimensional vector and generates a low‐dimensional encoding. We have an input vector of size 30 and we will use an encoding size of 15. You can change this and see if you get better results. You can see this code in Listing 6.12.

Here are the results:

Layer (type)                 Output Shape       Param # 
=================================================================
input_51 (InputLayer)        (None, 30)         0 
_________________________________________________________________
dense_119 (Dense)            (None, 15)         465 
_________________________________________________________________
dense_120 (Dense)            (None, 30)         480 
=================================================================
Total params: 945
Trainable params: 945
Non‐trainable params: 0 

Now let's train the model on our x_train and x_val arrays. Notice that we don't have y_train and y_val arrays. We use the input as the expected output. You can see this code in Listing 6.13.

Here are the results:

Train on 25800 samples, validate on 2867 samples
Epoch 1/25
25800/25800 [==============================] ‐ 3s 131us/step ‐ loss:
1.7821 ‐ acc: 0.3620 ‐ val_loss: 1.8113 ‐ val_acc: 0.5225
Epoch 2/25
25800/25800 [==============================] ‐ 1s 46us/step ‐ loss:
1.5699 ‐ acc: 0.5834 ‐ val_loss: 1.7444 ‐ val_acc: 0.6264
Epoch 3/25
25800/25800 [==============================] ‐ 1s 48us/step ‐ loss:
1.5282 ‐ acc: 0.6578 ‐ val_loss: 1.7110 ‐ val_acc: 0.6983
Epoch 4/25
25800/25800 [==============================] ‐ 1s 47us/step ‐ loss:
1.5010 ‐ acc: 0.7069 ‐ val_loss: 1.6911 ‐ val_acc: 0.7203
Epoch 5/25
25800/25800 [==============================] ‐ 1s 48us/step ‐ loss:
1.4760 ‐ acc: 0.7460 ‐ val_loss: 1.6697 ‐ val_acc: 0.7719
Epoch 6/25
25800/25800 [==============================] ‐ 1s 47us/step ‐ loss:
1.4617 ‐ acc: 0.7763 ‐ val_loss: 1.6483 ‐ val_acc: 0.7733
Epoch 7/25
25800/25800 [==============================] ‐ 1s 47us/step ‐ loss:
1.4521 ‐ acc: 0.7834 ‐ val_loss: 1.6391 ‐ val_acc: 0.7939
Epoch 8/25
25800/25800 [==============================] ‐ 1s 48us/step ‐ loss:
1.4463 ‐ acc: 0.7956 ‐ val_loss: 1.6355 ‐ val_acc: 0.8036
Epoch 9/25
25800/25800 [==============================] ‐ 1s 57us/step ‐ loss:
1.4430 ‐ acc: 0.8025 ‐ val_loss: 1.6298 ‐ val_acc: 0.8033
Epoch 10/25
25800/25800 [==============================] ‐ 1s 55us/step ‐ loss:
1.4407 ‐ acc: 0.8062 ‐ val_loss: 1.6350 ‐ val_acc: 0.8022
Epoch 11/25
25800/25800 [==============================] ‐ 1s 49us/step ‐ loss:
1.4398 ‐ acc: 0.8091 ‐ val_loss: 1.6290 ‐ val_acc: 0.8099
Epoch 12/25
25800/25800 [==============================] ‐ 1s 49us/step ‐ loss:
1.4384 ‐ acc: 0.8114 ‐ val_loss: 1.6273 ‐ val_acc: 0.8036
Epoch 13/25
25800/25800 [==============================] ‐ 1s 48us/step ‐ loss:
1.4379 ‐ acc: 0.8126 ‐ val_loss: 1.6258 ‐ val_acc: 0.8183
Epoch 14/25
25800/25800 [==============================] ‐ 1s 51us/step ‐ loss:
1.4374 ‐ acc: 0.8140 ‐ val_loss: 1.6267 ‐ val_acc: 0.8204
Epoch 15/25
25800/25800 [==============================] ‐ 1s 49us/step ‐ loss:
1.4368 ‐ acc: 0.8144 ‐ val_loss: 1.6257 ‐ val_acc: 0.8186
Epoch 16/25
25800/25800 [==============================] ‐ 2s 59us/step ‐ loss:
1.4363 ‐ acc: 0.8164 ‐ val_loss: 1.6260 ‐ val_acc: 0.8141
Epoch 17/25
25800/25800 [==============================] ‐ 1s 53us/step ‐ loss:
1.4358 ‐ acc: 0.8174 ‐ val_loss: 1.6253 ‐ val_acc: 0.8190
Epoch 18/25
25800/25800 [==============================] ‐ 1s 53us/step ‐ loss:
1.4356 ‐ acc: 0.8160 ‐ val_loss: 1.6243 ‐ val_acc: 0.8183
Epoch 19/25
25800/25800 [==============================] ‐ 1s 50us/step ‐ loss:
1.4353 ‐ acc: 0.8169 ‐ val_loss: 1.6257 ‐ val_acc: 0.8137
Epoch 20/25
25800/25800 [==============================] ‐ 1s 54us/step ‐ loss:
1.4351 ‐ acc: 0.8186 ‐ val_loss: 1.6245 ‐ val_acc: 0.8134: 0s ‐ loss:
1.4152 ‐ a
Epoch 21/25
25800/25800 [==============================] ‐ 1s 56us/step ‐ loss:
1.4347 ‐ acc: 0.8198 ‐ val_loss: 1.6237 ‐ val_acc: 0.8116
Epoch 22/25
25800/25800 [==============================] ‐ 1s 52us/step ‐ loss:
1.4346 ‐ acc: 0.8181 ‐ val_loss: 1.6255 ‐ val_acc: 0.8193s ‐ loss:
1.3752 ‐ ‐ ETA: 0s ‐ loss: 1.4163 ‐ acc: 0.
Epoch 23/25
25800/25800 [==============================] ‐ 1s 51us/step ‐ loss:
1.4343 ‐ acc: 0.8194 ‐ val_loss: 1.6232 ‐ val_acc: 0.8148
Epoch 24/25
25800/25800 [==============================] ‐ 1s 54us/step ‐ loss:
1.4342 ‐ acc: 0.8189 ‐ val_loss: 1.6230 ‐ val_acc: 0.8155
Epoch 25/25
25800/25800 [==============================] ‐ 1s 56us/step ‐ loss:
1.4340 ‐ acc: 0.8216 ‐ val_loss: 1.6265 ‐ val_acc: 0.8123 

We will plot the accuracy and loss values for training and validation datasets. You can see this code in Listing 6.14.

The results are two charts, as shown in Figures 6.11 and 6.12.

Chart depicting the model accuracy plot for autoencoder, plotting accuracy values for training and validation datasets.

Figure 6.11: Model accuracy plot for autoencoder

Chart depicting the model loss plot for autoencoder, plotting loss values for training and validation datasets.

Figure 6.12: Model loss plot for autoencoder

Now we will make a prediction with the trained autoencoder on the testing dataset. We will compare the input values with predictions and calculate the reconstruction error for each data point. Since we trained on normal transactions, these should have a low reconstruction error. Fraudulent transactions will have different data distributions and should give us a higher reconstruction error. You can see this code in Listing 6.15.

The result is shown in Figure 6.13.

Chart presenting the normal, fraud, threshold reconstruction error predictions on testing data using autoencoder.

Figure 6.13: Predictions on testing data using autoencoder

This chart in Figure 6.13 tells us a good story. We see the reconstruction error as high for the fraudulent transactions, shown in orange. For the normal transactions in blue, we see most points below our defined threshold. Now we don't catch all the fraudulent transactions, but more than 75% of them, which is very good. You can explore modifying hyper‐parameters like number of layers and neurons to see if you get better results. Hopefully, this code shows you the power of Deep Learning to find patterns in data and detect anomalies. Since this is unsupervised, we did not give labeled outputs. You can use this approach in pretty much any domain of data.

Summary

In this chapter, we looked at some unique applications of the Deep Learning technology. We saw how we can use the neural style transfer method to transfer the style of a painting to our own images. Then we saw generative networks and created new data points that highly resemble real data. Finally, we saw the use of a special type of network called an autoencoder that learns to find anomalies in data using unsupervised learning. These methods are pretty new and proposed by researchers in several publications. The Deep Learning community is truly awesome and shares valuable content with everyone. You can explore new papers as they are published on the Cornell University site (arxiv.org) to learn about new solutions as they are developed. Also, I highly encourage you to contribute your own papers here so everyone can benefit from your knowledge!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.204.186