10

Considering the Ramifications of Deepfakes

Deepfakes have received more than a little press over the last few years because everyone thinks it’s pretty amazing when Diep Nep becomes Morgan Freeman (https://www.youtube.com/watch?v=oxXpB9pSETo). Yes, deepfakes (also known as synthetic reality) have some fantastic uses for both work and pleasure. However, the technology also represents one of the most terrifying security issues for users of deep learning today. How can you trust anything said by anyone during a video chat when you can’t even be sure you’re talking with the person in question, rather than a hacker? That video security system with facial recognition that your organization just purchased is completely worthless when it comes to deepfakes because it’s possible now to create fake faces (https://www.theverge.com/tldr/2019/2/15/18226005/ai-generated-fake-people-portraits-thispersondoesnotexist-stylegan). Any thoughts you had of using biometric authentication is a waste of time because, with the right deepfake, it’s possible to create face biometrics (https://www.wired.com/story/deepmasterprints-fake-fingerprints-machine-learning/). It gets worse, and in this chapter, you will become aware of all the details.

There is no magic involved in deepfakes, as you will see in this chapter with the use of autoencoder and generative adversarial network (GAN) examples. There are many other ways to create a deepfake, but these two methods are quite illustrative and relatively fast at creating them (with relatively being the operative term). Running the examples and seeing how they progress over time will help you understand that deepfakes rely on stable technology that nearly anyone can employ to make security for your organization a true nightmare. The only thing that is preventing the rampant use of deepfakes today is that they’re time-consuming to create and the person creating them has to have the correct skills (both of which are topics that this chapter covers as well). With these issues in mind, this chapter discusses the following topics:

  • Defining a deepfake
  • Understanding autoencoders
  • Understanding GANs

Technical requirements

This chapter requires that you have access to either Google Colab or Jupyter Notebook to work with the example code. The requirements to use this book is to use the section of Chapter 1, Defining Machine Learning Security, which provides additional details on how to set up and configure your programming environment.

You really do benefit from having a graphics processing unit (GPU) to run the examples in this chapter. They will run without a GPU but expect to take long coffee breaks while you wait for the code to complete running. This means choosing Runtime | Change Runtime Type in Google Colab, then selecting GPU in the Hardware Accelerator dropdown. Desktop users will want to review the Checking for a GPU with a nod toward Windows section of the chapter for desktop system instructions.

Set up your system to run TensorFlow. Google Colab users should read https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb. The Installing TensorFlow section of this chapter provides details on how to perform the software installation on desktop systems, where you’ll generally get far superior performance than when using Google Colab. Your system must meet the following minimal requirements:

  • Operating system:
    • Ubuntu 16.04 or higher (64-bit)
    • macOS 10.12.6 (Sierra) or higher (64-bit) (no GPU support)
    • Windows Native – Windows 7 or higher (64-bit)
    • Windows WSL2 – Windows 10 19044 or higher (64-bit)
  • Python 3.7 through Python 3.10
  • Pip version 19.0 or higher, or 20.3 or higher on macOS
  • Visual Studio 2015, 2017, or 2019 on Windows systems
  • NVidia GPU Support (optional):
    • GPU drivers version 450.80.02 or higher
    • Compute Unified Device Architecture (CUDA) Toolkit version 11.2
    • cuDNN SDK version 8.1.0
    • TensorRT (optional)

When testing the code, use a test site, test data, and test APIs to avoid damaging production setups and to improve the reliability of the testing process. Testing over a non-production network is highly recommended. Using the downloadable source is always highly recommended. You can find the downloadable source on the Packt GitHub site at https://github.com/PacktPublishing/Machine-Learning-Security-Principles or on my website at http://www.johnmuellerbooks.com/source-code/.

Defining a deepfake

A deepfake (sometimes deep fake) is an application of deep learning to images, sound, video, and other forms of generally non-textual information to make one thing look or sound like something else. The idea is to deceive someone into thinking a thing is something that it’s not.

This chapter doesn’t mean to imply that the use of deepfakes will always deceive others in a bad way. For example, it’s perfectly acceptable to take a family picture, then put it through an autoencoder or a GAN and make it look like a Renoir painting. In fact, some deepfakes are amusing or even educational. The point at which a deepfake becomes a problem is when it’s used to bypass security or perform other seemingly impossible tasks. In a court of law, a deepfake video could convince a jury to convict someone who is innocent. Throughout the following sections, you will learn more about deepfakes from an ML security perspective.

Identifying deepfakes

It’s still possible to detect deepfakes in a number of ways. Even though the artifacts (an anomaly that presents information that shouldn’t be there) are getting smaller and harder to detect, images created using deepfake technology often do have artifacts. Voices are still very tough to reproduce, so they’re often a dead giveaway that something is faked. Deepfakes usually don’t do well with side views, so simply watching when someone turns their head will likely give the deepfake away (as will a person never turning their head to the side, which is unnatural). Differences in mannerisms and all sorts of other subtle clues often give a deepfake away. However, deepfakes are becoming more realistic as technology advances, so new ways of detecting them will have to be invented.

Modifying media

Media modifications can serve all sorts of purposes. For example, someone could use media modifications to change security camera footage or to make it appear that someone has said or done something that they really haven’t. Reading Deep Fakes’ Greatest Threat Is Surveillance Video (https://www.forbes.com/sites/kalevleetaru/2019/08/26/deep-fakes-greatest-threat-is-surveillance-video/?sh=5cf6b7c44550) will give you a better idea of just how big a threat this sort of modification can be.

It’s important to realize that deepfakes can modify any type of media. Seeing the results of deepfakes on YouTube is entertaining, but such videos are hardly the tip of the iceberg of what deepfakes can do. For example, there isn’t any reason to believe that a deepfake can’t modify sensor data from various kinds of inputs such as cameras, microphones, temperature sensors, and so on. A terrorist could easily modify such inputs to create an emergency or keep someone from detecting one. The problem gets worse. A field commander might end up misdirecting troops based on bad footage from a drone camera whose output was overridden and deepfake supplied in its place.

Common deepfake types

Some deepfake types are relatively common today and it pays to spend time learning about them. These kinds of deepfakes may not directly affect your organization, but they do provide a basis for understanding the potential for misusing deepfakes. The following list provides you with some examples of commonly misused deepfakes:

  • Fake news: People base their actions on information they receive from various sources, including the news. It’s possible to use fake news, which is a form of documentation for something that never really happened, to modify human behavior and possibly use that modification to their advantage. Fake news generators (see an example of this at https://www.thefakenewsgenerator.com/) make it very tough to differentiate real news from fake news. Deepfakes make it possible to place a trusted news commentator on screen with a fake news story that they never filmed.
  • Hoaxes: A deepfake hoax is a hoax in which the perpetrator is attempting to scam a group into believing something is true for personal gain. One of the most interesting current scams is deepfaked online interviews used to secure remote employment.
  • Voice impersonation: A deepfake voice impersonation falls into a special category because people will often react to a phone call or other vocal communication that occurs remotely without even thinking about it. They hear what they think is the person’s voice and then do what the voice says. So, it’s possible that a deepfake could do things such as telling someone to stop payment on an important shipment of goods that manufacturing needs to complete a sale, giving a competitor an edge. Voice impersonation can also make it possible to bypass biometric security.
  • Fake people: Creating fake people using deepfake techniques is the practice of using deep learning to make people who look real, but aren’t. What you need to think about is the potential for scams. For example, fake people would make perfect candidates for government programs. With all of the right documentation (deepfaked, of course), a person could apply for just about anything and get it as long as a personal meeting isn’t required.
  • Pornography: One of the first uses of deepfake technology to gain public notoriety was the proliferation of fake pornographic videos and images. Whether of public figures or private citizens, the use of non-consensual sexual images is harmful to those depicted. Unscrupulous hackers may use deepfakes to manufacture blackmail material.

This list of the uses of deepfakes by hackers and scammers doesn’t even begin to tell you about the many ways in which deepfakes are used today. It’s now possible to spend days reading articles about all of the ways in which deepfakes have been used to scam people. An issue here is that the technology is still in its natal state. The technology is such that you can expect to find deepfakes everywhere all the time.

The history of deepfakes

Deepfakes have been around for a while. The basis of what you see today about deepfakes started in a 1997 paper, Video Rewrite: Driving Visual Speech with Audio, written by Christoph Bregler, Michele Covell, and Malcolm Slaney (http://chris.bregler.com/videorewrite/). The essence of the technique is to automate some of the graphic effects that movie studios use to create movies. Of course, deepfake technology was built on previous technology that worked with audio and facial expressions in a 3D space.

Early videos had some serious problems with regard to the uncanny valley: people could tell something was off with the facial expressions because they weren’t quite right and often felt creepy. Further work starting in 2000 looked into making faces more lifelike. One of the best examples of this focus is the Active appearance models whitepaper, which is available at https://ieeexplore.ieee.org/document/927467.

However, the first time deepfake technology gained mainstream attention was the 2018 deepfake video of performer and filmmaker Jordan Peele impersonating President Obama (https://www.youtube.com/watch?v=cQ54GDm1eL0). The deepfake is good enough that you don’t really know it’s a deepfake until Jordan Peele reveals himself near the end. Even though the video really was convincing, experts were finally able to detect it was a deepfake based on how the facial expressions were presented. However, this won’t be a problem for long because the technology continues to advance. For example, current technologies address flaws such as moving eyebrows when the person stops talking.

Now that you have some idea of what deepfakes are all about, it’s time to prepare to create one. The next section of this chapter discusses the specialized setup used to work with both autoencoders and GANs. This special setup is necessary because the computing power needed to create a deepfake is significant.

Creating a deepfake computer setup

Creating a deepfake requires building serious models, using specialized software on systems that have more than a little computing horsepower. The system used for testing and in the screenshots for this chapter is more modest. It has an Intel i7 processor, 24 GB of RAM, and an NVidia GeForce GTX 1660 Super GPU. This system is used to ensure that the examples will run in a reasonable amount of time, with reasonable being defined as building a model in about half an hour or less. The example as a whole will require more time, likely in the hour range. The following sections will help you install a TensorFlow setup that you can use for autoencoder and GAN development without too many problems, and help you test your setup to ensure it actually works.

Installing TensorFlow on a desktop system

Desktop developers may already have TensorFlow installed, but if you’re not sure then you likely don’t. The technique for creating the advanced models in this chapter relies on using TensorFlow (https://www.tensorflow.org/). The basic reason for going with this route is that the development process is easier and you can create a relatively simple example sooner so that you can see how such a model would work. In order to use TensorFlow, you must install the required support. You can verify that you have the required support installed by opening an Anaconda prompt and typing the following:

conda list tensorflow

Alternatively, you can enter the following code:

pip show tensorflow

If you have TensorFlow installed, it will show up as one of the installed packages on your system. This section assumes that you have the Conda utility available on your desktop system. If you installed the Anaconda suite on your machine, then you have Conda available by default. Otherwise, you need to install Miniconda using the instructions at https://docs.conda.io/en/latest/miniconda.html for your platform. If your system is too old to support the current version of Miniconda, then you can’t run the examples in this chapter.

In addition to ensuring you have access to the Conda utility on Windows systems, you must also have Visual Studio 2015, Visual Studio 2017, or Visual Studio 2019 installed before you do anything else. You can obtain a free copy of Visual Studio 2019 Community Edition at https://learn.microsoft.com/en-us/visualstudio/releases/2019/release-notes.

Once you know you have a version of the Conda utility available, you can use these steps to set up the prerequisite tools for the examples in this chapter. The following steps will work for most platforms. However, if you encounter problems, you can also find a set of steps at https://www.tensorflow.org/install/pip. Select your platform at the top of the instruction list:

  1. Type conda install nb_conda_kernels and press Enter. Type y and press Enter when asked.
  2. Type conda create --name tf python=3.9 and press Enter to create a clean environment for your TensorFlow installation. Type y and press Enter when asked.
  3. Type conda activate tf and press Enter.

Deactivation

Type conda deactivate tf and press Enter to deactivate the TensorFlow environment when you no longer need to keep the environment running.

  1. Type conda install ipykernel and press Enter. Type y and press Enter when asked.
  2. Download and install the NVIDIA driver for your platform and GPU type from https://www.nvidia.com/Download/index.aspx.
  3. Test the installation by typing nvidia-smi and pressing Enter. You should see a listing of device specifics, along with the processes that are currently using the GPU.
  4. Type conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0 and press Enter to install the CUDA toolkit and cuDNN SDK. Type y and press Enter when asked. Note that this step can take a while to complete.
  5. Depending on your platform and how you have the CUDA toolkit installed, you need to add a path statement to your platform so that it can find the CUDA toolkit. The paths commonly needed are as follows:
    NVIDIA GPU Computing ToolkitCUDAv11.3inNVidia GPU Computing ToolkitCUDAv11.3libx64
    NVidia GPU Computing ToolkitCUDAv11.3include
  6. Type pip install --upgrade pip and press Enter to verify that you have the latest version of pip installed.
  7. Type pip install tensorflow and press Enter. Type y and press Enter when asked.
  8. Type ipython kernel install --user --name=tf and press Enter. This step will make the environment appear in the menus so that you can access it.

Now that you have TensorFlow installed, you can verify that your installation will work correctly using the instructions in the next section. These instructions ensure that you can access your GPU and that your GPU is of the correct type.

Checking for a GPU

Training a model with a GPU might only require half an hour, but training it without one could easily cost you six or more hours. You want to make sure that your GPU is actually working, accessible, and of the right type before you start working with the example code. Otherwise, you have long waits and sometimes inexplicable errors to deal with when the code itself is fine.

The MLSec; 10; Check for GPU Support.ipynb file contains source code and some detailed instructions for configuring and checking your GPU setup on Windows. The source code works equally well for Linux and Windows systems, so Linux users can test their setup too. When using the downloadable source, you must set the kernel to use your tf environment by choosing Kernel | Change Kernel | Python [conda env:tf]. Otherwise, even if you have a successful TensorFlow installation, Jupyter Notebook won’t use it. The following steps test the setup from scratch:

  1. Create a new Python file by choosing the tf environment option from the New drop-down list in place of the usual Python 3 (ipykernel) option, as shown in Figure 10.1.
Figure 10.1 – The menu for selecting which environment to use

Figure 10.1 – The menu for selecting which environment to use

  1. Type !conda env list in the first cell. Click Run. You will see output similar to Figure 10.2. The output will vary by your platform and Conda setup. Notice that the tf environment has an asterisk next to it, indicating that it’s the one in use.
Figure 10.2 – A list of the available Conda environments

Figure 10.2 – A list of the available Conda environments

  1. Type the following code to determine the selected environment using the operating system environment variables instead. If the method in step 2 doesn’t work, this one always does, but it provides less information:
    import osprint (os.environ['CONDA_DEFAULT_ENV'])

The output of tf shows that the tf environment is selected as the Conda default environment for scripts you create and run in Jupyter Notebook.

  1. Type the following code to obtain a list of computing devices on your system, which will include your GPU if TensorFlow is correctly installed:
    from tensorflow.python.client import device_libdevice_lib.list_local_devices()

The output shown in Figure 10.3 is representative of the output you will see when you run this cell. However, the specifics will vary by GPU and your system may actually have multiple GPUs installed, which would mean that the list would include all of them.

Figure 10.3 – A list of the processing devices on the local system

Figure 10.3 – A list of the processing devices on the local system

  1. Type the following code to verify that the GPU is recognized as a GPU:
    import tensorflow as tftf.config.list_physical_devices('GPU')

The output shown in Figure 10.4 is typical. It tells you the device name and the device type. Only a GPU with the correct hardware support will show up. Consequently, if you don’t see your GPU, it’s too old to use with TensorFlow or it lacks the appropriate drivers.

Figure 10.4 – A description of a GPU on the local system

Figure 10.4 – A description of a GPU on the local system

  1. Verify that the GPU will actually perform the required math when interacting with TensorFlow using the following code. Note that this requires the creation of tensors using tf.constant():
    tf.debugging.set_log_device_placement(True)a = tf.constant([[1.0, 2.0, 3.0],
        [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0],
        [3.0, 4.0],
        [5.0, 6.0]])
    c = tf.matmul(a, b)
    print(c)

The output shown in Figure 10.5 is precisely what you should see. This code creates two matrixes that it then multiplies together using matmul().

Figure 10.5 – The result of matrix multiplication using TensorFlow

Figure 10.5 – The result of matrix multiplication using TensorFlow

At this point, you know that your setup will work with the code in the chapter, so you can proceed to the next section about autoencoders with confidence. When creating the examples in this chapter, always remember to use the tf environment to start a new code file or modify the kernel using the Kernel | Change Kernel menu command.

Now that you have a deepfake setup to use, it’s time to employ it by creating an example. The autoencoder is the simplest of the deepfake technologies to understand, so we will cover it in the next section.

Understanding autoencoders

An autoencoder encodes data and compresses it, then decodes data and decompresses it, which doesn’t seem like a very helpful thing to do. However, it’s what happens during the encoding and decoding process that makes autoencoders useful. For example, during this process, the autoencoder can remove noise from a picture, sound, or video, thus cleaning it up. Autoencoders are simpler than GANs and they’re commonly used today for the following important tasks (in order of relevance):

  • Data de-noising
  • Data dimensionality reduction
  • Teaching how more complex techniques work
  • Detail context matching (where the autoencoder receives a small high-resolution piece of an image as input and is able to find it in a lower-resolution target image)
  • Toy tasks, such as jigsaw puzzle solving
  • Simple image generation

The third use means that anyone taking a class on more advanced machine learning techniques will likely encounter autoencoders before GANs because autoencoders are definitely simpler. However, from a hacker’s perspective, the data de-noising use is probably more important because a model doesn’t actually know what de-noising is. Instead, it’s simply an algorithm removing unwanted patterns from data and replacing those elements with wanted patterns. So, a hacker could de-noise data in a manner that benefits them and would likely be unnoticeable to the end user, even though it’s quite noticeable to any software encountering the data. In the following sections, you will see more details about autoencoders and how to create one.

Not just videos

It’s essential to remember that deepfakes can be images, sounds, videos, and any other sort of media you can think of. For example, it might be possible to deepfake smells or sensations (not that I’ve personally encountered either, but it’s possible). Sites such as https://www.tamikothiel.com/cgi-bin/LendMeYourFace.cgi make it possible to create a deepfake of your face into an image. If you need a fake face, you can create one at https://generated.photos/. For a security professional, thinking of deepfakes only as videos will get you into trouble because deepfakes can take all sorts of forms. Consider this, what happens when someone deepfakes the badges used to gain access to your organization? This chapter uses images for deepfakes because they provide a good starting point for infinitely more complex videos. Think about a video as simply being a series of images (which it is) and you’ll appreciate why starting with images is a good idea.

Defining the autoencoder

An autoencoder is categorized as self-supervised learning. In other words, there are still labels involved, but the targets are generated by the model itself. The target function, the part of the model that is involved in selecting and creating labels, must be designed in such a way that it recognizes useful features in the training data. Part of this process is to provide rules that prioritize functionality; for instance, it’s critical to consider visual macro structure (elements needed to provide the minimum connectivity required to tell a coherent world story) as being more important than pixel-level details (providing a precise match between elements). The concepts vary between autoencoders depending on the purpose the autoencoder serves.

The decoding process is also critical. It can generate output that modifies the input in a specific manner, such as removing noise. A hacker can modify the decoding process by sending bad inputs that modify the manner in which the decoder does things, such as removing noise. If a hacker has direct access to the application, it’s possible to modify the decoder in specific ways that modify the output at the pixel level so that humans can’t see the changes, but the software does. Consequently, the training of the decoder often involves some trial and error to obtain the specific effect required in the output. You must test decoding periodically to ensure that the decoder continues to work as specified. There are a number of characteristics, besides being self-supervised, that make autoencoders a specific kind of deep learning technology:

  • They are data specific, which means that they don’t work well on every kind of data in a particular category, only the data that they’re trained to work with. As a consequence, you can’t use them in a general way, such as compressing sound in the same manner that MPEG-2 Audio Layer III (MP3) does.
  • The process is lossy, which means that even in the best situation, the output will show some loss of detail.
  • An autoencoder is trained with data, not engineered to perform encoding and decoding in a particular way.

Creating an autoencoder requires these three parts:

  • Encoder: A neural network designed to compress the data using specific algorithms, but the compression relies on the data used to train the neural network.
  • Decoder: A neural network designed to decompress the data using algorithms that match the compression process provided by the encoder.
  • Loss function: The method of determining the amount of compression versus the amount of loss that occurs. The loss function defines a balance between how much compression can occur for a given amount of loss, the two characteristics have an inverse relationship.

Working with an autoencoder example

This section looks at a simple autoencoder that has multiple layers, but this autoencoder simply compresses and then decompresses a series of images. The idea is that you will see that the compression and decompression process is learned by a neural network, rather than hardwired like a coder-decoder (CODEC) is. Figure 10.6 gives you an idea of what this autoencoder looks like graphically.

Figure 10.6 – A block diagram of the example autoencoder

Figure 10.6 – A block diagram of the example autoencoder

The autoencoder consists of two separate programming entities, the encoder and the decoder. Later, you will see in the code that the loss function is included as part of the autoencoder. You can find the source code for this example in the MLSec; 10; Autoencoder.ipynb file in the downloadable source.

Now that you have some idea of what the code will do for this example, it’s time to look at the various pieces shown in Figure 10.6. The following sections detail the items in the block diagram so you can see how they work in Python.

Obtain the Fashion-MNIST dataset

This example works with a number of different image datasets. However, in this case, you see it used with the common Fashion-MNIST dataset because it offers a level of detail that some other datasets don’t provide, and it works well in shades of gray. The deepfake aspect of this particular dataset is that you can use it to see how images react to compression and decompression using a trained model, rather than other process. What you need to think about is what sorts of modifications could be made at each layer to make the image look like something it isn’t. The following steps show how to obtain and configure the dataset in the example:

  1. Import the required packages:
    from tensorflow.keras.datasets import fashion_mnistimport matplotlib.pyplot as plt
  2. Divide the dataset into training and testing datasets:
    (x_train, _), (x_test, _) = fashion_mnist.load_data()x_train = x_train.astype('float32') / 255.
    X_test = x_test.astype('float32') / 255.
    Print (x_train.shape)
    print (x_test.shape)

When you perform this step, you will see the output shown in Figure 10.7. Unlike other splitting methods shown so far in the book, this one is automatic. However, you could also split it manually if desired. Notice that the shape shows the size of each image is 28x28 pixels.

Figure 10.7 – The output showing the split between training and testing datasets

Figure 10.7 – The output showing the split between training and testing datasets

  1. Create the showFigures() function to provide a means of displaying any number of the dataset images:
    def showFigures(dataset, n_items=10, title="Test"):    fig, axs = plt.subplots(1, n_items,
            constrained_layout=True)
        fig.suptitle(title, y=0.65, fontsize=16,)
        for i in range(n_items):
            plt.gray()
            axs[i].imshow(dataset[i])
            axs[i].get_xaxis().set_visible(False)
            axs[i].get_yaxis().set_visible(False)
        plt.show()
  2. Test the showFigures() function:
    showFigures(x_train, title="Training Data")

The output shown in Figure 10.8 demonstrates that this is a good dataset to choose for looking at the level of detail. In addition, the images do look good in shades of gray.

Figure 10.8 – Ten of the images from the training dataset

Figure 10.8 – Ten of the images from the training dataset

Now that you have a dataset to use, it’s time to start working with it using an autoencoder. The next section addresses the encoder part of the autoencoder.

Build an encoder

The first part of an autoencoder is the encoder. This is the code that does something to compress or otherwise manipulate the data as part of the input stream. This example exclusively uses the Dense layers, but there are a great many more layer types listed at https://keras.io/api/layers/ that perform tasks other than simple compression. For example, instead of flattening the image outside of the model, you could use the Flatten layer to flatten it inside the model. The following steps show how to create the encoder for this example:

  1. Import the required packages:
    from tensorflow.keras import layersfrom tensorflow import keras
    import tensorflow as tf
    import numpy as np
  2. Reshape the training and testing datasets for use with the model:
    x_train = x_train.reshape((len(x_train),    np.prod(x_train.shape[1:])))
    x_test = x_test.reshape((len(x_test),
        np.prod(x_test.shape[1:])))
    print(x_train.shape)
    print(x_test.shape)

The output from this step, shown in Figure 10.9, indicates that each image is now a 784-bit vector, rather than a 28x28-bit matrix.

Figure 10.9 – Each image is now a 784-bit vector

Figure 10.9 – Each image is now a 784-bit vector

  1. Create the encoder starting with an Input layer and then adding three Dense layers to progressively compress the image. All three of the Dense layers use the rectified linear activation unit (ReLU) activation function, which outputs the value directly when it is positive, otherwise, it outputs a value of 0 (https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/).
    encoder_inputs = keras.Input(shape=(784,))encoded = layers.Dense(
        128, activation="relu")(encoder_inputs)
    encoded = layers.Dense(
        64, activation="relu")(encoded)
    encoded = layers.Dense(
        32, activation="relu")(encoded)

Notice that the code describes the shape of each layer and the activation function. A Dense layer offers a lot more in the way of configuration options, as explained at https://keras.io/api/layers/core_layers/dense/. The value in parenthesis after the call to Dense() is a linking value. It links the new layer to the previous layers. So, the first Dense layer is linked to encoder_inputs, while the second Dense layer is linked to the first Dense layer. The next step is to build the decoder, which of course links to the encoder.

Build a decoder

The decoder in this example simply consists of more Dense layers. However, instead of making each succeeding layer denser, it makes each layer less dense, as shown in the following code:

decoded = layers.Dense(64, activation="relu")(encoded)
decoded = layers.Dense(128, activation="relu")(decoded)
decoded = layers.Dense(784, activation="sigmoid")(decoded)

The decoder follows the same pattern as the encoder, but the layers go up in size, as shown in Figure 10.6. Notice that the first Dense layer is linked to the final encoded layer. The final Dense layer uses a sigmoid activation function, which has a tendency to smooth the output (https://machinelearningmastery.com/a-gentle-introduction-to-sigmoid-function/) and ensures that the output value is always between 0 and 1. The encoder and decoder are essentially separate right now as shown in Figure 10.6, so it’s time to put them into the autoencoder box so that they can work together as described in the next section.

Build the autoencoder

What this section is really about is putting things into boxes, starting with the autoencoder box. These boxes are models that are used to allow the autoencoder to perform useful work. You’ll see that they work together to create a specific type of neural network that can manipulate data in various ways, depending on the layers you use. The following steps take you through the process of building the autoencoder, which involves creating several models:

  1. Create the autoencoder model and display its structure:
    autoencoder = keras.Model(encoder_inputs, decoded)autoencoder.summary()

As shown in Figure 10.10, the autoencoder model incorporates all of the layers of the encoder and decoder we constructed in earlier sections.

Figure 10.10 – The structure of the autoencoder as a whole

Figure 10.10 – The structure of the autoencoder as a whole

  1. Compile the autoencoder, which includes adding a loss function and an optimizer so that the autoencoder works efficiently:
    autoencoder.compile(optimizer='adam',    loss='binary_crossentropy')
  2. Create the encoder model and display its structure:
    encoder = keras.Model(encoder_inputs,    encoded, name="encoder")
    print(encoder.summary())

The output shown in Figure 10.11 demonstrates that this model is the first half of the autoencoder.

Figure 10.11 – The structure of the encoder model

Figure 10.11 – The structure of the encoder model

  1. Create the decoder model and display its structure:
    encoded_input = keras.Input(shape=(32,))decoder_layer_1 = 
        autoencoder.layers[-3](encoded_input)
    decoder_layer_2 = 
        autoencoder.layers[-2](decoder_layer_1)
    decoder_layer_3 = 
        autoencoder.layers[-1](decoder_layer_2)
    decoder = keras.Model(encoded_input, decoder_layer_3,
        name="decoder")
    print(decoder.summary())

This code requires a little more explanation than the encoder model. First, the encoded_input has a shape of 32, now because it’s compressed. Second, each of the decoded layers of the model comes from the compiled autoencoder, which is why they’re referred to as autoencoder.layers[-3], autoencoder.layers[-2], and autoencoder.layers[-1]. If you count up the autoencoder model layers shown in Figure 10.10, you will see that this arrangement basically begins with the first level of the decoder and works down from there. As with building the decoder, you must also connect the layers together as shown. Figure 10.12 shows the structure of the decoder model.

Figure 10.12 – The structure of the decoder model

Figure 10.12 – The structure of the decoder model

It’s time to create and train the model as a whole by fitting it to the data. This process tracks the learning curve of the neural network so you can see how it works.

Create and train a model from the encoder and decoder

You still have to fit the autoencoder to the data. Part of this process is optional. The following example uses TensorBoard (https://www.tensorflow.org/tensorboard) to track how the learning process goes and to provide other information covered in the next section:

  1. Import the required packages:
    from keras.callbacks import TensorBoard
  2. Use magics to load TensorBoard into memory. It isn’t included by default:
    %load_ext tensorboard
  3. Fit the autoencoder model to the data:
    autoencoder.fit(    x_train, x_train, epochs=50, batch_size=256,
        shuffle=True, validation_data=(x_test, x_test),
        callbacks=[TensorBoard(log_dir='autoencoder')])

Defining the number of epochs determines how long to train the model. batch_size determines how many of the samples are used for training during any given epoch. Setting shuffle to True means that the dataset is constantly shuffled to promote better training. Unlike other models that you’ve worked with, this one automatically validates the data against the test set specified by validation_data. Finally, the callbacks argument tells the fit() function to send learning data to TensorBoard and also tells TensorBoard where to store the data. When you run this cell, you will see the epoch data is similar to that shown in Figure 10.13 (shortened for the book).

Figure 10.13 – Epoch data for each epoch of model training

Figure 10.13 – Epoch data for each epoch of model training

At this point, you can start to see how the model learned by examining the statistics shown in the next section.

Obtaining and graphics model statistics

The process for viewing the statistics is relatively easy. All you need to do is start TensorBoard using the following code:

%tensorboard --logdir 'autoencoder'

The output is a relatively complex-looking display containing all sorts of interesting graphs, as shown in Figure 10.14. To get the full benefit from them, consult the guide at https://www.tensorflow.org/tensorboard/get_started.

Figure 10.14 – One of many TensorBoard statistical outputs

Figure 10.14 – One of many TensorBoard statistical outputs

In this case, what you see is the effect of the model steadily learning. The loss becomes less during each epoch so overall it ends up being quite small. Hovering your mouse over the graph shows data point information with precise values.

TensorBoard! Why won’t you just die?

This particular note is designed to save you from pulling out your hair given the advice that you’ll likely receive online if you’re using a Windows system. Restarting Windows won’t do anything for you, so don’t waste your time. Windows doesn’t support the kill command, so adding !kill <PID> to your code isn’t going to produce anything but an error message if you try it after seeing the Reusing TensorBoard on port 6006... message. You may get lucky and find that using the tasklist command (https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/tasklist) will display the TensorBoard.exe application, but it’s unlikely in many cases because it isn’t actually running. If you do see TaskBoard.exe, you can use the taskkill command to stop it. However, what you’ll end up doing in most cases is to halt and close your Notebook, then stop the Jupyter Notebook server as you normally do at the end of a session. To fix the problem, start by deleting the TensorBoard log files that were created during the fitting process. Next, locate the Users<Username>AppDataLocalTemp.tensorboard-info directory on your system and delete it. At this point, you can restart Jupyter Notebook and go on your merry way, hair intact.

Testing the model

This section answers the question of whether the model will compress and decompress data with minimal loss. The following steps show you how to test this:

  1. Perform the required prediction:
    encoded_imgs = encoder.predict(x_test)decoded_imgs = decoder.predict(encoded_imgs)

The prediction process isn’t a single step in this case. So, what you see is the output shown in Figure 10.15, which helps you track the prediction process.

Figure 10.15 – Tracking the prediction process

Figure 10.15 – Tracking the prediction process

  1. Display the input and output figures:
    showFigures(x_test.reshape(10000,28,28),    title="Original Testing Data")
    showFigures(decoded_imgs.reshape(10000,28,28),
        title="Modified Testing Data")

Part of this process reshapes the data so that you can display it on screen. The output shows ten of the images for side-by-side comparison, as shown in Figure 10.16.

Figure 10.16 – A side-by-side comparison of input to output

Figure 10.16 – A side-by-side comparison of input to output

The loss of detail should tell you something about the potential security issues with autoencoders. Because any data manipulation you perform is likely to cause some type of degradation, it pays to choose your models and configuration carefully. Otherwise, it becomes very hard to determine whether a particular issue is the result of hacker activity or simply due to a bad model.

Seeing the effect of bad data

At the outset of this example, you discovered that autoencoders learn how to transform specific data. That is, if you feed the autoencoder what amounts to bad data, even if that data isn’t from a hacker, then the results are going to be less useful. This section puts that theory to the test using the following steps. What is important to note is that the model isn’t trained again; you’re using the same model as before to simulate the introduction of unwanted data:

  1. Import the required packages:
    from tensorflow.keras.datasets import mnist
  2. Split the data into training and testing sets:
    (x_train, _), (x_test, _) = mnist.load_data()x_train = x_train.astype('float32') / 255.
    x_test = x_test.astype('float32') / 255.
    print (x_train.shape)
    print (x_test.shape)
  3. Reshape the training and testing data:
    x_train = x_train.reshape((    len(x_train), np.prod(x_train.shape[1:])))
    x_test = x_test.reshape((
        len(x_test), np.prod(x_test.shape[1:])))
    print(x_train.shape)
    print(x_test.shape)
  4. Perform a prediction:
    encoded_imgs = encoder.predict(x_test)decoded_imgs = decoder.predict(encoded_imgs)
  5. Compare the input and output:
    showFigures(x_test.reshape(10000,28,28),    title="Original Testing Data")
    showFigures(decoded_imgs.reshape(10000,28,28),
        title="Modified Testing Data")

Figure 10.17 shows the results of the comparison. The results, needless to say, are disappointing.

Figure 10.17 – A side-by-side comparison of using the model with the wrong data

Figure 10.17 – A side-by-side comparison of using the model with the wrong data

Remember that this is a basic example where you’re in full control of everything that happens. From a security perspective, you need to consider what would happen if a hacker fed your autoencoder bad data without you knowing it. Suddenly, you might start seeing unexpected results and may not be able to track them down very easily.

Understanding CNNs and implementing GANs

Convolutional neural networks (CNNs) are great for computer vision tasks. For example, you might partly depend on facial recognition techniques to secure your computing devices, buildings, or other infrastructure. By adding facial recognition to names and passwords (or other biometrics), you provide a second level of protection. However, as shown in the Seeing adversarial attacks in action section of Chapter 3, Mitigating Inference Risk by Avoiding Adversarial Machine Learning Attacks, it’s somewhat easy to fool the facial recognition application.

The problem isn’t the facial recognition application but rather the underlying model, which has been trained with good pictures of the various employees. The way around this problem is to create a dataset that contains both real and fake images of the employees so that the CNN learns to recognize the difference. Figure 10.18 shows a potential setup for training purposes.

Figure 10.18 – A Pix2Pix GAN use to supplement model training

Figure 10.18 – A Pix2Pix GAN use to supplement model training

Using a Pix2Pix GAN is the approach suggested in Developing a Robust Defensive System against Adversarial Examples Using Generative Adversarial Networks (https://www.mdpi.com/2504-2289/4/2/11/pdf). Of course, now the issue is finding a way to generate fake images of real employees. That’s where a Pix2Pix GAN comes into play. You can feed it pictures of the real employee and then have the Pix2Pix GAN create any number of fake images that are tampered with in specific ways. The CNN will learn to differentiate between real and fake images based on the Pix2Pix output. From a hacker’s perspective, trying to fool security cameras (as an example) now becomes a lot harder because the security camera software is trained to recognize fake faces.

An overview of a Pix2Pix GAN

Phillip Isola (et. al) originally presented the idea of a Pix2Pix GAN in the Image-to-Image Translation with Conditional Adversarial Networks whitepaper (https://arxiv.org/abs/1611.07004), in 2016. You’d follow essentially the same process to perform the task for your ML application:

  1. Choose an employee picture to modify.
  2. Choose a model.
  3. Perform the translation.
  4. Add the result to a dataset containing both real pictures and fake pictures.

This Pix2Pix GAN example relies on a combination of a U-Net generator and a PatchGAN discriminator, as shown in Figure 10.19.

Figure 10.19 – Diagram of a Pix2Pix GAN with U-Net generator and PatchGAN discriminator

Figure 10.19 – Diagram of a Pix2Pix GAN with U-Net generator and PatchGAN discriminator

Each time the generator creates a new image, the discriminator tests it. If the discriminator can accurately determine whether the image is fake or real, then the generator is updated with new weights so it can produce better output. When the discriminator is unable to determine the fake images from the real ones, the discriminator weights are updated instead. In this way, the two models keep working against each under until the generator can produce a usable output.

Obtaining and viewing the images

Finding a Pix2Pix GAN example that’s specific enough for use in a security setup is hard, which is why this chapter goes into more detail about creating one. This process begins by obtaining a dataset containing the images needed to train the GAN from https://www.kaggle.com/datasets/balraj98/facades-dataset. Once you download the dataset, unarchive it in the facades subdirectory of the example code. The following steps will get you started on manipulating the images:

  1. Import the required packages:
    import tensorflow as tffrom matplotlib import pyplot as plt
  2. Define the image size and location:
    IMG_WIDTH = 256IMG_HEIGHT = 256
    PATH = 'facades/'
  3. Create a function for displaying the images:
    def load(image_file):    raw_image = tf.io.read_file(image_file)
        decode_image = tf.image.decode_jpeg(raw_image)
        image = tf.cast(decode_image, tf.float32)
        return image
  4. Separate two images from the rest:
    real_image = load(PATH+'trainA/40_A.jpg')input_image = load(PATH+'trainB/40_B.jpg')
  5. View the images:
    fig, axes = plt.subplots(nrows=1, ncols=2)axes[0].imshow(input_image/255.0)
    axes[0].set_title("Input Image")
    axes[1].imshow(real_image/255.0)
    axes[1].set_title("Real Image")

Figure 10.20 shows the Input image (semantic labels) on the left and the Real image (ground truth) on the right.

Figure 10.20 – Input image (semantic labels) and real image (ground truth) used as GAN input

Figure 10.20 – Input image (semantic labels) and real image (ground truth) used as GAN input

Notice how the input image mimics the real image, using colors to label the input image so that the generator has an easier time creating a believable output. The colors relate to features in the real image. There is no actual guide on which colors to use, the colors simply serve to delineate various features. Usually, a human hand creates the input image using techniques such as those discussed at https://ml4a.github.io/guides/Pix2Pix/. Distortions in the input image will modify the output, as you see later in the example. A hacker could contaminate the input image database in a manner that modifies the output in specific ways that are to the hacker’s advantage. The next section tells you about the image manipulation requirements.

Manipulating the images

The images aren’t very useful in their original form, so it’s important to know how to make the images appear in the way that you need them to appear, without losing any data. The following steps show you how to achieve this:

  1. Create a function to resize the images to 286x286x3 to allow for random cropping, resulting in a final size of 256x256x3:
    def resize(input_image, real_image, height, width):    input_image = tf.image.resize(
            input_image, [height, width],
            method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
        real_image = tf.image.resize(
            real_image, [height, width],
            method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
        return input_image, real_image
  2. Create a function to crop the images to a 256x256x3 size in a random manner:
    @tf.autograph.experimental.do_not_convertdef random_crop(input_image, real_image):
        stacked_image = tf.stack(
            [input_image, real_image], axis=0)
        cropped_image = tf.image.random_crop(
            stacked_image,
            size=[2, IMG_HEIGHT, IMG_WIDTH, 3])
        return cropped_image[0], cropped_image[1]
  3. Define a function to add jitter to the image:
    @tf.function()def random_jitter(input_image, real_image):
        input_image, real_image = resize(
            input_image, real_image, 286, 286)
        input_image, real_image = random_crop(
            input_image, real_image)
        if tf.random.uniform(()) > 0.5:
            input_image = 
                tf.image.flip_left_right(input_image)
            real_image = 
                tf.image.flip_left_right(real_image)
        return input_image, real_image
  4. Create a plot to show four image pairs consisting of an input image and a real image:
    fig, axes = plt.subplots(nrows=2, ncols=4,    figsize=(12, 6))
    fig.tight_layout(pad=2)
    for i in range(2):
        for j in range(0, 4):
            if j%2 == 0:
            changed_input_image, changed_real_image = 
            random_jitter(input_image, real_image)
            axes[i, j].imshow(changed_input_image/255.0)
            axes[i, j].set_title("Input Image")
            axes[i, j +1].imshow(changed_real_image/255.0)
            axes[i, j + 1].set_title("Real Image")

Figure 10.21 shows the typical output at this step, with the various modifications labeled.

Figure 10.21 – The output of the image modification tests

Figure 10.21 – The output of the image modification tests

  1. Define a normalizing function. The act of normalization prevents training problems that can occur due to differences in the various images:
    def normalize(input_image, real_image):    input_image = (input_image / 127.5) - 1
        real_image = (real_image / 127.5) - 1
        return input_image, real_image
  2. Perform the image normalization:
    normal_input_image, normal_real_image =     normalize(input_image, real_image)
    print(normal_input_image)

This final step tests the normalization process. Figure 10.22 shows the typical output. The actual output is significantly longer than shown.

Figure 10.22 – The normalized façade image output

Figure 10.22 – The normalized façade image output

Note that while it’s easy to determine that the data is acceptable in form, it’s not possible to tell that it’s the right data. Hacker modifications would be impossible to detect at this point, which is why you need to perform testing and verification of data when it’s in a form that you can detect modifications. Now that everything is in place to manipulate the images, it’s time to create the actual datasets.

Developing datasets from the modified images

As with all of the machine learning examples in the book so far, you need a training dataset and a testing dataset to use with the model. The following steps show how to create the required image datasets:

  1. Import the required packages:
    import os
  2. Create a function to load a training image. The training images have to be randomized for the model to work well. However, the testing images should appear as normal to truly test the model in a real-world setting:
    @tf.autograph.experimental.do_not_convertdef load_image_train(files):
        input_image = load(files[0])
        real_image = load(files[1])
        input_image, real_image = 
            random_jitter(input_image, real_image)
        input_image, real_image = 
            normalize(input_image, real_image)
        return input_image, real_image
  3. Create a function to load a testing image. Note that the images must be resized to 256x256 so that they appear the same as in the original dataset:
    @tf.autograph.experimental.do_not_convertdef load_image_test(files):
        input_image = load(files[0])
        real_image = load(files[1])
        input_image, real_image = resize(input_image,
            real_image, IMG_HEIGHT, IMG_WIDTH)
        input_image, real_image = normalize(input_image,
            real_image)
        return input_image, real_image
  4. Specify the dataset parameters. This is the same as the batch_size setting used when fitting the autoencoder in the previous example:
    BUFFER_SIZE = 200BATCH_SIZE = 1
  5. Create a list of files to process for the training dataset:
    real_files = os.listdir(PATH+'trainA')real_files = 
        [PATH+'trainA/'+file for file in real_files]
    input_files = os.listdir(PATH+'trainB')
    input_files = 
        [PATH+'trainB/'+file for file in input_files]
    file_list = 
        list(map(list, zip(input_files, real_files)))
    for file in file_list:
        print(file)

The input and real files appear in two different directories. Yet, these files are actually paired with each other. Consequently, what this code does is create a list of lists, where each pair appears in its own list. Figure 10.23 shows the output from this step.

Figure 10.23 – The pairings of input and real files used to create the training dataset

Figure 10.23 – The pairings of input and real files used to create the training dataset

  1. Load the training dataset. The first step actually creates the dataset from the file_list prepared in the previous step. This series of filenames is used to load the images using the load_image_train() function defined in step 2. Because the dataset is currently in a specific order, the shuffle() function randomizes the image order. Finally, the number of batches to perform is set:
    train_dataset =     tf.data.Dataset.from_tensor_slices(file_list)
    train_dataset = 
        train_dataset.map(load_image_train,
            num_parallel_calls=4)
    train_dataset = train_dataset.shuffle(BUFFER_SIZE)
    train_dataset = train_dataset.batch(BATCH_SIZE)
  2. View the result. The dataset includes both features and labels:
    features, label = iter(train_dataset).next()print("Example features:", features[0])
    print("Example label:", label[0])

Figure 10.24 shows a very short example of what you’ll see as output.

Figure 10.24 – A list of tensors based on the input and real images

Figure 10.24 – A list of tensors based on the input and real images

  1. Create a list of files to process for the testing dataset. This is basically a repetition of the process for the training dataset:
    real_files = os.listdir(PATH+'testA')real_files = 
        [PATH+'testA/'+file for file in real_files]
    input_files = os.listdir(PATH+'testB')
    input_files = 
        [PATH+'testB/'+file for file in input_files]
    file_list = list(map(list, zip(input_files,
            real_files)))
  2. Load the testing dataset:
    test_dataset =     tf.data.Dataset.from_tensor_slices(file_list)
    test_dataset = test_dataset.map(load_image_test)
    test_dataset = test_dataset.batch(BATCH_SIZE)

The datasets are finally ready to use. Now it’s time to create the generator part of the Pix2Pix GAN.

Creating the generator

A U-Net generator is known as such because it actually forms a kind of U in the method it uses for processing data. The process consists of downsampling (encoding), which compresses the data, and upsampling (decoding), which decompresses the data. Figure 10.25 shows the U-Net for this example.

Figure 10.25 – A diagram of a U-Net generator

Figure 10.25 – A diagram of a U-Net generator

If you think the graphic in Figure 10.25 looks sort of like a fancy version of the autoencoder in Figure 10.6, you’d be right in a way. The U-Net generator does compress and decompress data like the autoencoder but it does so in a smarter way so that it can generate a new image from the existing one. Unfortunately, the model is susceptible to the same forms of hacking as an autoencoder is. Sending bad inputs will affect this model just as much as affects an autoencoder, so you need to exercise care in keeping your model free from hacker activity. Of course, there is a lot more going on than just compression and decompression. The following sections build each element of the U-Net generator in turn.

Defining the downsampling code

Downsampling relies on a number of layers to compress the data. These layers accomplish the following purposes:

  • Conv2D: Convolutes the layer inputs to produce a tensor of output values. Essentially, this is the part that compresses the data.
  • BatchNormalization: Performs a transformation to keep the output mean close to 0 and the standard deviation close to 1.
  • LeakyReLU: Provides a leaky version of a ReLU that provides activation for the layer. The term Leaky ReLU means that there is a small gradient applied when the input is negative.

Now that you have a better idea of what the downsampling layers mean, it’s time to look at the required code. The following steps show how to build this part of the U-Net generator:

  1. Create a downsample() function:
    OUTPUT_CHANNELS = 3def downsample(filters, size):
        initializer = tf.random_normal_initializer(
            0., 0.02)
        result = tf.keras.Sequential()
        result.add(
            tf.keras.layers.Conv2D(filters, size,
                strides=2, padding='same',
                kernel_initializer=initializer,
                use_bias=False))
        result.add(
            tf.keras.layers.BatchNormalization())
        result.add(tf.keras.layers.LeakyReLU())
        return result
  2. View the downsample() function results. This code tests the downsample() function using just one image:
    down_model = downsample(3, 4)down_result = 
        down_model(tf.expand_dims(input_image, 0))
    print (down_result.shape)

The output is the shape of the tested image, as shown in Figure 10.26.

Figure 10.26 – Output of the initial downsample() function test

Figure 10.26 – Output of the initial downsample() function test

When you compare this test to Figure 10.26, you will see that the batch size is reflected in the first return value, the size of the image is reflected in the second and third values, and the number of filters in the fourth value. Because this is the first step of compression, the original 256x256 size of the input image is reduced in half.

Defining the upsampling code

Upsampling relies on a number of layers to decompress the data. These layers accomplish the following purposes:

  • Conv2DTranspose: Deconvolutes the layer inputs to produce a tensor of output values. Essentially, this is the part that decompresses the data.
  • BatchNormalization: Performs a transformation to keep the output mean close to 0 and the standard deviation close to 1.
  • ReLU: Provides activation for the layer.

As you can see, the upsampler follows a process similar to the downsampler, just in reverse. The following steps describe how to create the upsampler and test it:

  1. Create the upsample() function:
    def upsample(filters, size):    initializer = tf.random_normal_initializer(
            0., 0.02)
        result = tf.keras.Sequential()
        result.add(
            tf.keras.layers.Conv2DTranspose(filters,
                size, strides=2, padding='same',
                kernel_initializer=initializer,
                use_bias=False))
        result.add(
            tf.keras.layers.BatchNormalization())
        result.add(tf.keras.layers.ReLU())
        return result
  2. View the upsample() function results:
    up_model = upsample(3, 4)up_result = up_model(down_result)
    print (up_result.shape)

This code is upsampling the downsampled image, so Figure 10.27 shows that the image is now 256x256 again.

Figure 10.27 – The upsampled result of the downsampled image

Figure 10.27 – The upsampled result of the downsampled image

As with the autoencoder example, you now have a downsampler (which is akin to the encoder) and an upsampler (which is akin to the decoder). However, you don’t have an entire U-Net generator yet. The next section takes these two pieces and puts them together to create the generator depicted in Figure 10.25.

Putting the generator together

You now have everything needed to create a U-Net like the one depicted in Figure 10.25 (referencing the figure helps explain the code). The following code puts everything together. The comments tell you about the size changes that occur as the images are downsampled, then upsampled:

def Generator():
    inputs = tf.keras.layers.Input(shape=[256, 256, 3])
    down_stack = [
        downsample(64, 4),   # 128, 128, 64
        downsample(128, 4),  # 64, 64, 128
        downsample(256, 4),  # 32, 32, 256
        downsample(512, 4),  # 16, 16, 512
        downsample(512, 4),  # 8, 8, 512
        downsample(512, 4),  # 4, 4, 512
        downsample(512, 4),  # 2, 2, 512
        downsample(512, 4),  # 1, 1, 512
    ]
    up_stack = [
        upsample(512, 4),     # 2, 2, 1024
        upsample(512, 4),     # 4, 4, 1024
        upsample(512, 4),     # 8, 8, 1024
        upsample(512, 4),     # 16, 16, 1024
        upsample(256, 4),     # 32, 32, 512
        upsample(128, 4),     # 64, 64, 256
        upsample(64, 4),      # 128, 128, 128
    ]
    initializer = tf.random_normal_initializer(0., 0.02)
    # 256, 256, 3
    last = tf.keras.layers.Conv2DTranspose(
        OUTPUT_CHANNELS, 4, strides=2, padding='same',
        kernel_initializer=initializer, activation='tanh')
    x = inputs
    skips = []
    for down in down_stack:
        x = down(x)
        skips.append(x)
    skips = reversed(skips[:-1])
    for up, skip in zip(up_stack, skips):
        x = up(x)
        x = tf.keras.layers.Concatenate()([x, skip])
    x = last(x)
    return tf.keras.Model(inputs=inputs, outputs=x)
generator = Generator()

The code begins, just as the autoencoder did, with the creation of an Input layer. Next, comes a series of downsampling and upsampling steps to perform. The process ends with a final call to Conv2DTranspose(), defined as function last(), which returns the image to its former size of 256x256x3. So, just as with the autoencoder, you have an input, compression stages, decompression states, and an output layer. The main difference, in this case, is that the model uses the tanh activation, which is similar to the sigmoid activation used for the autoencoder, except that tanh works with both positive and negative values.

As shown in Figure 10.25, this model allows for the use of skips, where the data goes from some level of the downsample directly to the corresponding layer of the upsample without traversing all of the layers. This approach allows the generator to create a better model because not every image is processed to precisely the same level. The skips are completely random. The final step is to actually create the generator with a call to Generator().

Defining the generator loss function

The loss function helps optimize the generator weights. As the generator produces images, the weights help determine changes in generator output so that the generator output better matches the original image. The following steps show how to create the generator loss function for this example:

  1. Create the required loss_object function, which uses cross-entropy to determine the difference between true labels and predicted labels:
    loss_object =     tf.keras.losses.BinaryCrossentropy(
        from_logits=True)
  2. Specify the LAMBDA value, which controls the amount of regularization applied to the model:
    LAMBDA = 100
  3. Define the loss function, which includes calculating the initial loss (gan_loss) and the L1 loss (l1_loss). Then, use them to create a total_gen_loss for the generator as a whole:
    def generator_loss(disc_generated_output,    gen_output, target):
        gan_loss = loss_object(
            tf.ones_like(disc_generated_output),
            disc_generated_output)
        l1_loss = tf.reduce_mean(
            tf.abs(target - gen_output))
        total_gen_loss = gan_loss + (LAMBDA * l1_loss)
        return total_gen_loss, gan_loss, l1_loss

The example uses the BinaryCrossentropy() loss function as a starting point for creating a hand-tuned loss function for the U-Net. It then calculates an additional L1 loss value, which minimizes the error from the sum of all the absolute differences between the true values and the predicted values. The total loss is then calculated by adding the cross-entropy loss to the L1 loss (after equalizing the two values) after having multiplied it by a constant. Empirically it has been demonstrated that this combination of losses helps the network to converge faster and in a more stable way.

Creating the discriminator

The PatchGAN is a type of discriminator where a patch of a specific size (30x30x1 in this case) is run across images to determine whether they’re fake or real. This example begins with two 256x256x3 images: the first is of the input image from the generator, while the second is the target image from the dataset.

To create the patch, the images are downsampled (compressed). After that, the compressed images are processed in various ways, as determined by the model. Figure 10.28 shows the process graphically.

Figure 10.28 – A diagram of the PatchGAN discriminator

Figure 10.28 – A diagram of the PatchGAN discriminator

As with the generator, the discriminator consists of a number of pieces that are best understood when discussed individually. The following sections tell you about them.

Putting the discriminator together

Even though Figure 10.28 may look a little complex, the actual coding doesn’t require much effort after working through the intricacies of the U-Net generator. In fact, the code is rather short, as shown here. The comments tell you about the downsampling process so that you can compare it with Figure 10.28. There are similarities between this model and the autoencoder from earlier, but in this case, you can see there are two inputs instead of one, so you need to concatenate them:

Def Discriminator():
    initializer = tf.random_normal_initializer(0., 0.02)
    inp = tf.keras.layers.Input(
        shape=[256, 256, 3], name='input_image')
    tar = tf.keras.layers.Input(
        shape=[256, 256, 3], name='target_image')
    # 256, 256, channels*2)
    x = tf.keras.layers.concatenate([inp, tar])
    down1 = downsample(64, 4)(x)        # 128, 128, 64
    down2 = downsample(128, 4)(down1) # 64, 64, 128
    down3 = downsample(256, 4)(down2) # 32, 32, 256
    # 34, 34, 256
    zero_pad1 = tf.keras.layers.ZeroPadding2D()(down3)
    # 31, 31, 512
    conv = tf.keras.layers.Conv2D(
        512, 4, strides=1,
        kernel_initializer=initializer,
        use_bias=False)(zero_pad1)
    batchnorm1 = 
        tf.keras.layers.BatchNormalization()(conv)
    leaky_relu = tf.keras.layers.LeakyReLU()(batchnorm1)
    # 33, 33, 512
    zero_pad2 = 
        tf.keras.layers.ZeroPadding2D()(leaky_relu)
    # 30, 30, 1
    last = tf.keras.layers.Conv2D(
        1, 4, strides=1,
        kernel_initializer=initializer)(zero_pad2)
    return tf.keras.Model(inputs=[inp, tar], outputs=last)
discriminator = Discriminator()

The goal of this code is to create a useful model—one that takes the two image sets as input and provides a 30x30 patch as output. Images are processed in patches. It uses these steps to make the determination between fake and real images:

  1. Obtain the input and target images sized at 256x256x3.
  2. Concatenate the images for processing. Each image should be a separate channel.
  3. Downsample the images contained in x (the two concatenated channels) to 32x32x256.
  4. Add zeros around the entire image so that each image is now 34x34.
  5. Compress the data to 31x31x512.
  6. Normalize the image data.
  7. Perform the activation function.
  8. Add zeros around the entire image so that each image is now 33x33.
  9. Create a patch of 30x30x1.
  10. Return the model consisting of the original input data and the patch.

The patch is run convolutionally across the image and the results are averaged to determine whether the image as a whole is fake or real.

Defining the discriminator loss

As with the generator loss function, the discriminator loss function helps optimize weights but the function works with the discriminator in this case, rather than the generator. So, there are two loss functions, one for the generator and another for the discriminator. Here is the code used for the discriminator loss:

def discriminator_loss(disc_real_output,
    disc_generated_output):
        real_loss = loss_object(
        tf.ones_like(disc_real_output), disc_real_output)
    generated_loss = loss_object(
        tf.zeros_like(disc_generated_output),
        disc_generated_output)
    total_disc_loss = real_loss + generated_loss
    return total_disc_loss

There are a number of steps in calculating the loss:

  1. Create a real image loss object using 1s to show this is the real image.
  2. Create a generated image loss object using 0s to show this is the fake image.
  3. Calculate the loss based on whether the discriminator sees the real image as real and the fake image as fake.

As you can see, the discriminator loss is different from the generator loss in that the discriminator loss determines whether the image is fake or real.

Performing optimization of both generator and discriminator

To make the process of generating and discriminating images faster and better, the code relies on the Adam() optimization function for both the generator and the discriminator. The Adam() optimization function relies on the stochastic gradient descent method. The primary reason to use it in this case is that it’s computationally efficient and doesn’t require a lot of memory. A secondary reason is that it works well with noisy data, which is likely going to happen with images. The following code shows the calls for using the Adam() optimizer:

generator_optimizer = 
    tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_optimizer = 
    tf.keras.optimizers.Adam(2e-4, beta_1=0.5)

The first argument determines the learning rate for the generator and discriminator. You normally want these values to be the same or the model may not work as intended. The beta_1 value determines the exponential decay rate for the first moment (the mean, rather than the uncentered variance, which is determined by beta_2). You can read more about this function at https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam.

The next step is to perform the required training. However, because this process takes so long, you want to monitor it so that you can stop training and make adjustments as needed.

Monitoring the training process

As you train your model, you will want to see how the results change over time. The following code outputs three images: the input image (semantic labels), the ground truth (original image), and the predicted (generated) image:

def generate_images(model, test_input, tar):
    prediction = model(test_input, training=True)
    plt.figure(figsize=(15, 15))
    display_list = [test_input[0], tar[0], prediction[0]]
    title = ['Input Image', 'Ground Truth',
        'Predicted Image']
    for i in range(3):
        plt.subplot(1, 3, i+1)
        plt.title(title[i])
        plt.imshow(display_list[i] * 0.5 + 0.5)
        plt.axis('off')
    plt.show()

The following code provides a quick check of three images:

for example_input, example_target in test_dataset.take(1):
    generate_images(generator, example_input,
        example_target)

Figure 10.29 shows the typical results for an untrained model.

Figure 10.29 – The output of the generate_images() function using an untrained model

Figure 10.29 – The output of the generate_images() function using an untrained model

Now that you have the means for monitoring the training, it’s time to do some actual training of the model, as described in the next section. One thing to consider is that monitoring does provide a method of detecting potential hacker activity. If you have set everything up correctly and your model still isn’t producing the predicted image, then you might have problems with the training data, especially the input images, because small modifications would be tough to locate.

Training the model

Each step of the training occurs in what is termed an epoch. It helps to review how the training will occur by viewing Figure 10.19 as an overview. Figure 10.25 provides details of the generator and Figure 10.28 provides details of the discriminator. The following code finally puts together what these figures have been showing you. It outlines a single training step:

EPOCHS = 24
@tf.function
@tf.autograph.experimental.do_not_convert
def train_step(input_image, target, epoch):
    with tf.GradientTape() as gen_tape, 
        tf.GradientTape() as disc_tape:
            gen_output = generator(
                input_image, training=True)
        disc_real_output = discriminator(
            [input_image, target], training=True)
        disc_generated_output = discriminator(
            [input_image, gen_output], training=True)
        gen_total_loss, gen_gan_loss, gen_l1_loss = 
            generator_loss(disc_generated_output,
                gen_output, target)
        disc_loss = discriminator_loss(
            disc_real_output, disc_generated_output)
    generator_gradients = gen_tape.gradient(
        gen_total_loss, generator.trainable_variables)
    discriminator_gradients = disc_tape.gradient(
        disc_loss, discriminator.trainable_variables)
    generator_refittings = 3
    for _ in range(generator_refittings):
        generator_optimizer.apply_gradients(zip(
           generator_gradients,
           generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(
        discriminator_gradients,
        discriminator.trainable_variables))

The code is following these steps during each epoch:

  1. Create two GradientTape() objects (see https://www.tensorflow.org/api_docs/python/tf/GradientTape to record operations for automatic differentiation): one for the generator (gen_tape) and one for the discriminator (disc_tape). Think of a tape used for making backups or to record other kinds of information, except that this one is recording operations.
  2. Generate an image using the generator.
  3. Generate the real output using the discriminator.
  4. Generate the predicted output using the discriminator.
  5. Calculate the generator loss.
  6. Calculate the discriminator loss.
  7. Determine how much to change each model after each training cycle and place this result in generator_gradients for the generator and discriminator_gradients for the discriminator.
  8. Apply the changes to each model, optimizing the result in each case, using generator_optimizer.apply_gradients() for the generator and discriminator_optimizer.apply_gradients() for the discriminator.

Note that you can change the EPOCHS setting as needed for your system. The more epochs you use, the better the model, but each epoch takes a significant amount of time. The next step is to define the fitting function.

Specifying how to train the model

It’s important to remember that this is just one training step or epoch. The example performs 24 epochs to obtain a reasonable result. However, many Pix2Pix GANs go through 150 or more epochs to obtain a production-level result. During the testing process of the example, it became evident that the generator wasn’t being worked hard enough. So, the example also puts the generator through three application cycles to one for the discriminator, producing a better result in a shorter time. As you work through your code, you’ll likely find that you need to make tweaks like this to obtain a better result with an eye toward efficiency.

Defining the fitting function

Now that the steps to perform for each epoch are defined, it’s time to perform the fitting process. Fitting trains the generator and discriminator to produce the desired output. The following steps show the fitting process for this example:

  1. Import the required packages:
    from IPython import display
  2. Create the fit() function to perform the fitting:
    def fit(train_ds, epochs, test_ds):    for epoch in range(epochs):
            display.clear_output(wait=True)
            for example_input, example_target in 
                test_ds.take(1):
                    generate_images(generator,
                        example_input,
                        example_target)
            print("Epoch: ", epoch)
            for n, (input_image, target) in 
                train_ds.enumerate():
                    print('.', end='')
                    train_step(input_image, target, epoch)
            print()

The fit() function is straightforward. All it does is fit the two models (generator and discriminator) to the images one at a time, make adjustments, and then move on to the next epoch. The next section performs the actual fitting process.

Performing the fitting

So far, no one has really hit the run button. Everything is in place, but now it’s time to actually run it, which is the purpose of the code in the following steps:

  1. Perform the actual fitting task:
    fit(train_dataset, EPOCHS, test_dataset)

Figure 10.30 shows the output for a single epoch.

Figure 10.30 – The output from a single training epoch

Figure 10.30 – The output from a single training epoch

  1. Check the results:
    for example_input, example_target in     test_dataset.take(5):
            generate_images(generator, example_input,
                example_target)

The output will show five random images from the dataset. In looking at the output shown in Figure 10.31 for a single image from the output, you can see that the input image has affected the original (ground truth) image and produced the predicted output, which isn’t perfect at this point because the model requires more training.

Figure 10.31 – Sample output from the trained Pix2Pix GAN

Figure 10.31 – Sample output from the trained Pix2Pix GAN

The essential thing to remember about a Pix2Pix GAN is that it’s a complex model that requires a large dataset, which gives hackers plenty of opportunity to skew your model. As shown in Figure 10.31 (and any other Pix2Pix GAN example you want to review), it would be very hard if not impossible for a human to detect that the output has been skewed. What a human would see is that the image has been modified, hopefully in the right direction. A smart hacker could modify the model using any of a number of methods, with incorrect data being the easiest and most probable, to output just about anything.

Summary

This chapter has provided you with the barest of overviews of deepfakes and the technologies used to create them: autoencoders and GANs. What you should take away from this chapter is the knowledge that these technologies are simply tools that someone can use for good or evil intent. From a security perspective, using deepfakes can help harden your surveillance technologies and help you implement better facial recognition strategies. Of course, you also have to be wary of hackers who modify your models, damage your data, or try to sway the output of your models in a way that is beneficial to them using other methods.

Chapter 11 is going to move further into the security realm of GANs by looking at ways in which they are used by hackers to gain entry into your systems or by you to thwart hacker advances. The fact that GANs can learn from each experience means that the wall-building security strategies of the past have taken on a new aspect. The machines are not merely hosts for attacks any longer, they have become generators of automated attacks with the parameters of those attacks being controlled by one side or the other.

Further reading

The following bullets provide you with some additional reading that you may find useful for understanding the materials in this chapter better:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.146.87