© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
T. SarkarProductive and Efficient Data Science with Pythonhttps://doi.org/10.1007/978-1-4842-8121-5_5

5. Modular and Productive Deep Learning Code

Tirthajyoti Sarkar1  
(1)
Fremont, CA, USA
 

In the previous chapter, I explored the idea that most data scientists often come from a background that is quite far removed from traditional computer science/software engineering. Consequently, they produce code that is perfectly suitable for great exploratory data analysis, statistical modeling, or innovative ML experiments but not robust enough for the production phase of a large business platform. Data scientists often think in terms of the next analysis script but not along the lines of the next software module that integrates into a larger system.

Scripting is (mostly) the code you write for yourself. Software is the assemblage of code you (and other teammates) write for others. It is an undeniable fact that most data scientists, not having a traditional software development background and training, tend to write AI/ML analysis code mostly for themselves.

They just want to get to the heart of the pattern hidden in the data. Fast. Without thinking deeply about normal mortals (users). They write a block of code to produce a rich and beautiful plot. But they don’t create a function out of it to use later. They import lots of methods and classes from standard libraries. But they don’t create a subclass of their own by inheritance and add methods to it for extending the functionality.

In the previous chapter, you explored some of these issues through scikit-learn code and a typical classical ML task, fitting a logistic regression model. In this chapter, you will explore how similar principles can help you write better code for deep learning tasks with some hands-on examples using Keras/TensorFlow.

Modular Code and Object-Oriented Style for Productive DL

Functions, inheritance, methods, classes: they are at the heart of robust object-oriented programming (OOP). But you may not want to delve deeply into them if all you want to do is create a Jupyter notebook with your exploratory data analysis and plots.

You can avoid the initial pain of using OOP principles, but this almost always renders your notebook code non-reusable and non-extensible. More precisely, that piece of code serves only you (until you forget what exact logic you coded) and no one else. But readability (and, thereby, reusability) is critically important for any good software product/service. That is the true test of the merit of what you produce. Not for yourself. But for others.

Data science involving deep learning models and code is no exception. These days, powerful and flexible frameworks like TensorFlow or PyTorch make the actual coding of a complex neural network architecture relatively simple and brief. However, if the overall DS code is not modularized and well-organized (following much of the style discussed in Chapter 4 in the section “Why (and How) to Modularize Code for Machine Learning”), then it is plagued by the same issues of non-reproducibility and non-reusability. Let’s see some examples of how you can organize and modularize DL code in your data science work.

Example of a Productive DL Task Flow

Deep learning makes it easy to train ML models for highly nonlinear (and even noisy) datasets and phenomena. Modern frameworks like Keras/TensorFlow/PyTorch offer powerful and flexible APIs to build these models with relative ease and a surprisingly small amount of code. However, an end-to-end DS flow can be made much more productive if you follow some simple guidelines on how you build, manage, and utilize DL code. An approach of building compact modules and a systematic flow (shown in Figure 5-1) can help. Some examples of related guidelines are discussed below in the form of questions.
  • One of the most common and repetitive tasks for DL analysis is to build out a deep neural network (DNN) object. Data scientists routinely use non-modularized code to just add layers (e.g., from Keras (https://keras.io/api/layers/) or PyTorch https://pytorch.org/tutorials/recipes/recipes/defining_a_neural_network.html) APIs) and build this as a local variable in their Jupyter notebook. Wouldn’t it be a much better idea to create a custom function for this task?

  • After building an (untrained) model, you must compile (set learning rate, batch size, etc.) and run the model with data. Would a custom function help make this task modularized as well?

  • When you make such a DNN builder function, which parameters will be passed on? Which ones can be optional? What are the default values? If you encounter a situation where you don’t know how many parameters need to be passed on, are you using the *args and **kwargs that Python offers?

  • Did you write a docstring for that function to let others know what the function does and what parameters it expects plus an example?

  • Can you also modularize the code used to create the visual analytics based on the output of those model functions?

Figure 5-1

Deep learning task flow organized in modular fashion

Wrappers, Builders, Callbacks

Fundamentally, in the subsection above I described wrapping up the most essential tasks in a DL-based workflow inside custom functions and using them as the core building blocks of your data science code. Additionally, you can wrap up the tasks related to data formatting/transformation and prediction/inference in a similar fashion.

It is to be noted that wrapper functions for regression and classification tasks can have separate sets of architecture and parameters. So, it makes sense to keep their build customized. The choice of the default parameter values in the wrapper functions is of critical importance, too.

Apart from a simple functional wrapper, you can also utilize a powerful construct called a callback that caters to the dynamic nature of DNN training. Essentially, a callback is an object that can perform actions at various stages of DNN training (e.g., at the start or end of an epoch or before starting a single batch). You can use callbacks for various scenarios, including but not limited to the following:
  • Early stopping based on some error or computation criterion

  • Periodically saving the model to disk (making the system robust against unexpected failure)

  • Obtaining an overview on various internal states and statistics of a model in mid-flight (i.e., while the training is going on)

Finally, if you want to extend this approach all the way to the full OOP paradigm, you can build out classes and utility modules incorporating all these wrappers as special methods. You can call this a DL utility module, which you can call upon in any data science task where supervised ML modeling is needed.

Modular Code for Fast Experimentation

Let’s demonstrate the ideas discussed above using a simple case: a DL image classification problem with the Fashion MNIST (https://github.com/zalandoresearch/fashion-mnist) dataset. The core ML task is simple: build a classifier for this dataset, which is a funny spin on the original famous MNIST hand-written digit dataset. Fashion MNIST consists of 60,000 training images of 28 x 28-pixel size of objects related to fashion (e.g., hats, shoes, trousers, t-shirts, dresses, etc.). It also consists of 10,000 test images for model validation and testing. A slice of the dataset is shown in Figure 5-2 for illustration.
Figure 5-2

A slice of the Fashion MNIST dataset

Business/Data Science Question

The basic ML task for this dataset seems straightforward. But what if there is a higher-order optimization or visual analytics question around this core ML task: how does the model architecture complexity impact the minimum epochs it takes to reach the desired accuracy?

It should be clear to you why we even bother about such a question: because this is related to the overall business optimization. Training a neural net is not a trivial computational matter (www.technologyreview.com/s/613630/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/). Therefore, it makes sense to investigate what minimum training effort must be spent to achieve a target performance metric and how the choice of architecture impacts it.

The image classification accuracy could be related to a broader business outcome such as a fashion recommendation or clothing identification in a store. The core data science task helps optimize the cost of running that business task—to use the image database with the optimal expenditure of computing resources using the ML code as the underlying nuts and bolts.

In this example, you will not even use a convolutional neural network (CNN; https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53), which are commonly used for image classification tasks. This is because, for this dataset, a simple densely connected neural net can accomplish reasonably high accuracy, and, in fact, a sub-optimal performance is required to illustrate the main point of the higher-order optimization question posed above.

So, you must solve two problems:
  • What the minimum number of epochs for reaching the desired accuracy target and how do you determine this?

  • How does the specific architecture of the model impact this number or training behavior?

To achieve the goals, you will use two simple OOP principles:
  • Creating an inherited class from a base class object

  • Creating utility functions and calling them from a compact code block that can be presented to an external user for higher-order optimization and analytics

Inherit from the Keras Callback

You inherit a Keras callback class (as the base) and write your own subclass by adding a method that checks the training accuracy and takes an action based on that value. The code snapshot and some explanations are shown in Figure 5-3. More details on this can be found in the official TensorFlow article “Writing your own callbacks” at www.tensorflow.org/guide/keras/custom_callback.
Figure 5-3

A custom class built on top of a Keras callback

Basically, this simple callback results in dynamic control of the epochs; the training stops automatically when the accuracy reaches the desired threshold. Figure 5-4 shows a snapshot of an example run.
Figure 5-4

Snapshot of an example run with the callback enabled

Model Builder and Compile/Train Functions

Next, you put the Keras model construction code in a utility function so that a model of an arbitrary number of layers and architecture (as long as they are densely connected) can be generated using simple user input in the form of some function arguments. The code snapshot and the associated explanations are shown in Figure 5-5.
Figure 5-5

Snapshot of a model builder function

You also put the compilation and training code into a utility function to use those hyperparameters in a higher-order optimization loop conveniently. The code snapshot and the associated explanations are shown in Figure 5-6.
Figure 5-6

Snapshot of a model compiling and training function

Visualization Function

Next, it’s time for visualization. Generic plot functions take raw data as input. However, if you have a specific purpose of plotting the evolution of training set accuracy (and showing how it compares to the target), then your plot function should just accept the (trained) deep learning model as the input and generate the desired plot. The code snapshot and the associated explanations are shown in Figure 5-7.
Figure 5-7

Snapshot of the visualization function

A typical result (loss-accuracy plot) is shown in Figure 5-8.
Figure 5-8

A typical loss-accuracy plot from the trained DL model

Final Analytics Code, Compact and Simple

Thus far you have modularized the core DL code. Now you can take advantage of all the functions and classes you defined earlier and bring them together to accomplish the higher-order optimization task. Consequently, your final code will be highly compact, but it will generate the same interesting plots of loss and accuracy over epochs for a variety of accuracy threshold values and DNN architectures (neuron counts).

This will give you the ability to use a minimal amount of code to produce visual analytics about the choice of performance metric (classification accuracy in this case) and DNN architecture. This is the first step towards building an optimized machine learning system.

Generate a few cases for investigation:
from itertools import product
accuracy_desired = [0.85,0.9,0.95]
num_neurons = [16,32,64,128]
cases = list(product(accuracy_desired,num_neurons))
print("So, the cases we are considering are as follows... ")
for i,c in enumerate(cases):
      print("Accuracy target {}, number of neurons: {}".format(c[0],c[1]))
      if (i+1)%4==0 and (i+1)!=len(cases):
            print("-"*50)
This code generates the cases shown in Figure 5-9.
Figure 5-9

Some representative cases are generated for the optimization task

The final analytics/optimization code is succinct and easy to follow for a high-level user who does not need to know the complexity of Keras model building or callbacks classes. This is the core principle behind OOP, the abstraction of the layers of complexity, which you are able to accomplish for your deep learning task.

Note how you pass on the print_msg=False to the class instance. While you need basic printing of the status for the initial check/debug, you should execute the analysis silently for the optimization task. If you did not have this argument in your class definition, you would not have a way to stop printing debugging messages:
for c in cases:
    # A mycallback class with the specific accuracy target
    callbacks = myCallback(c[0], print_msg=False)
    # Build a model with a specific number of neurons
    model = build_model(num_layers=1,architecture=[c[1]])
    # Compile and train the model with the callback class.
    # Choose suitable batch size and a max epoch limit
    model = compile_train_model(model, x_train,y_train,callbacks=callbacks,
                              batch_size=32,epochs=30)
    # A suitable title string
    title = "Loss and accuracy over the epochs for accuracy threshold
    {} and number of neurons {}".format(c[0],c[1])
    # Use the plotting function, pass on the accuracy target,
    # trained model, and the custom title string
    plot_loss_acc(model,target_acc=c[0],title=title)
Some representative results are shown in Figure 5-10; they are automatically generated by executing the code block above. It clearly shows how with a minimal amount of high-level code you can generate visual analytics to judge the relative performance of various neural architectures for various levels of performance metrics. This enables a user, without tweaking the lower-level functions, to easily make a judgment on the choice of a model as per the desired accuracy and complexity.
Figure 5-10

Representative results for various model architecture (neuron counts per hidden layer) and accuracy targets

Also, note the custom titles for each plot. These titles clearly enunciate the target performance and the complexity of the neural net, thereby making the analytics easy. It was a small addition to the plotting utility function, but this shows the need for careful planning while creating such functions. If you had not planned for such an argument to the function, it would not have been possible to generate a custom title for each plot. This careful planning of the API (application program interface) is part and parcel of good OOP.

Turn the Scripts into a Utility Module

So far, you may be working with a Jupyter notebook, but you may want to turn this exercise into a neat Python module that you can import from any time you want. Just like you write from matplotlib import pyplot, you can import these utility functions (Keras model build, train, and plotting) anywhere. The idea is shown in Figure 5-11.
Figure 5-11

Building a deep learning utility module (for your own use)

Summary of Good Practices

You just learned some good practices, borrowed from OOP, to apply to a DL analysis task. Almost all of them may seem trivial to seasoned software developers. However, this chapter is designed for budding data scientists who may not have that structured programming background but need to understand the importance of imbuing these good practices in their ML workflow.

At the risk of repeating myself, let me summarize the good practices here:
  • Whenever you get a chance, turn repetitive code blocks into utility functions.

  • Think very carefully about the API of the function (i.e., the minimal set of arguments required and how they will serve a purpose for a higher-level programming task).

  • Don’t forget to write a docstring for a function, even if it is a one-liner description.

  • If you start accumulating many utility functions related to the same object, consider turning that object to a class and the utility functions as methods.

  • Extend class functionality whenever you get a chance to accomplish complex analysis using inheritance.

  • Don’t stop at Jupyter notebooks. Turn them into executable scripts and put them in a small module. Build the habit of modularizing your work so that it can be easily reused and extended by anyone, anywhere.

In the next chapter, you will try your hand at building your own ML estimator class based on these principles. For a taste of DL utility functions and a neural net trainer class, please read Deep learning with Python” at https://github.com/tirthajyoti/Deep-learning-with-Python/tree/master/utils.

Streamline Image Classification Task Flow

Image classification is one of the most common tasks in a data science workflow involving deep learning tools. Streamlining or automating such a task is, therefore, a prime example of the automation and modularization that I have been preaching thus far.

For this specific task, a data scientist may desire a single function to automatically pull images from a specified directory on the disk (or from a network address) and give back a fully trained neural net model, ready to be used for prediction. Therefore, in this section, you will explore how to use a couple of utility methods from the Keras (TensorFlow) API to streamline the training of such models (specifically for a classification task) with built-in data preprocessing.

Put simply, you want to
  • Grab some data.

  • Put it inside a directory/folder arranged by classes.

  • Train a neural net model with minimum code/fuss.

In the end, you aim to write a single utility function that can accept just the name/address of the folder where the training images are stored and give back a fully trained CNN model. The idea is visually illustrated in Figure 5-12.
Figure 5-12

Streamlining (and simplifying) the image classification task

The Dataset

Let’s use a dataset consisting of 4000+ images of flowers for this demo. The dataset can be downloaded from the Kaggle website here: . The data collection is based on Flickr, Google, and Yandex images. The pictures are divided into five classes:
  • Daisy

  • Tulip

  • Rose

  • Sunflower

  • Dandelion

For each class, there are about 800 photos. The photos are not particularly high resolution (about 320 x 240 pixels each). They are not reduced to a single size since they have different proportions. However, they come organized neatly in five directories named with the corresponding class labels. You can take advantage of this organization and apply the Keras methods to streamline the training of your convolutional network.

The full Jupyter notebook is in the GitHub repository. I will use selected snapshots of the code in this section to show the important parts for illustration.

Should you use a GPU?

It is recommended to run this script on a GPU. You will build a convolutional neural net (CNN) with five convolutional layers; consequently, the training process with thousands of images can be computationally intensive and slow if you are not using some sort of GPU. For the Flowers dataset, a single epoch took ~1 minute on my laptop with a NVidia GTX 1060 Ti GPU (6GB Video RAM), Core i-7 8770 CPU, and 16GB DDR4 RAM.

For illustration, Figure 5-13 shows how they are stored on a local hard disk. Some sample images are in Figure 5-14.
Figure 5-13

Stored Flowers image data

Figure 5-14

Sample flower images. Note the difference is shape and resolution

Building the Data Generator Object

This is where the actual magic happens. The official description of the ImageDataGenerator class says "Generate batches of tensor image data with real-time data augmentation. The data will be looped over (in batches)."

Basically, it can be used to augment image data with a lot of built-in preprocessing such as scaling, shifting, rotation, noise, whitening, etc. Right now, you’ll just use the rescale attribute to scale the image tensor values between 0 and 1. Here is a useful article on this aspect of the class: “How to increase your small image dataset using Keras ImageDataGenerat​or”(https://medium.com/@arindambaidya168/https-medium-com-arindambaidya168-using-keras-imagedatagenerator-b94a87cdefad).

But the real utility of this class for the current demonstration is the super useful method named flow_from_directory, which can pull image files one after another from the specified directory. Note that this directory must be the top-level directory where all the subdirectories of individual classes can be stored separately. The flow_from_directory method automatically scans through the subdirectories and sources the images along with their appropriate labels.

You can specify the class names (as you did here with the classes argument) but this is optional. However, you will later see how this can be useful for selective training from a large trove of data.

Another useful argument is the target_size, which lets you resize the source images to a uniform size of 200 x 200, no matter the original size of the image. This is some cool image-processing right there with a simple function argument.

You can also specify the batch size. If you leave batch_size unspecified, by default, it will be set to 32. Choose the class_mode as categorical since you are doing a multi-class classification here. Here is the code snippet:
batch_size = 128
from tf.keras.preprocessing.image import ImageDataGenerator
# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1/255)
# Flow training images in batches of 128
# All images will be resized to 200 x 200
train_generator = train_datagen.flow_from_directory(
        '../Data//flowers-recognition',
        target_size=(200, 200),
        batch_size=batch_size,
        classes = ['daisy','dandelion','rose','sunflower','tulip'],
        class_mode='categorical')
When you run this code, the Keras function scans through the top-level directory, finds all the image files, and automatically labels them with the proper class (based on the subdirectory they were in). The working of this utility is shown in Figure 5-15 with respect to the flowers’ dataset.
Figure 5-15

The ImageDataGenerator object working on the Flower dataset

What’s more interesting is that this is also a Python generator object (https://realpython.com/introduction-to-python-generators/). That means it will be used to yield data one by one during the training. This significantly reduces the problem of dealing with a very large dataset whose contents cannot be fitted into memory at one go.

Building the Convolutional Neural Net Model

For the sake of brevity, I will not delve deep into the code behind the CNN model. In brief, it consists of five convolutional layers/max-pooling layers and 128 neurons at the end followed by a 5-neuron output layer with a SoftMax activation for the multi-class classification. You use the RMSprop optimizer with an initial learning rate of 0.001. The model summary is shown in Figure 5-16. It has in excess of 200,000 trainable parameters.
Figure 5-16

Summary of the CNN model used for flower classification

Training with the fit_generator Method

I discussed the cool things the train_generator object does with the flow_from_directory method and with its arguments. Let’s utilize this object in the fit_generator method of the CNN model, defined above.

Note the steps_per_epoch argument to fit_generator. Since train_generator is a generic Python generator, it never stops and therefore the fit_generator will not know where a particular epoch ends and the next one starts. You have to let it know the steps in a single epoch. This is, in most cases, the length of the total training sample divided by the batch size. In the previous section, you found out the total sample size as total_sample. Therefore, in this particular case, the steps_per_epoch is set to int(total_sample/batch_size), which is 34, so you will see 34 steps per epoch in the training log below.
history = model.fit_generator(
        train_generator,
        steps_per_epoch=int(total_sample/batch_size),
        epochs=epochs,
        verbose=1)
When you execute, the model trains and you can check the accuracy/loss with the usual plot code (Figure 5-17).
Figure 5-17

Representative loss/accuracy plots of the CNN training task

Encapsulate All of This in a Single Function

What have you accomplished so far?

You have been able to utilize the Keras ImageDataGenerator and fit_generator methods to pull images automatically from a single directory, label them, resize and scale them, and flow them one by one (in batches) for training a neural network.

Can you encapsulate all of this in a single function?

One of the central goals of making useful software/computing systems is abstraction (i.e., hiding the gory details of internal computation and data manipulation, and presenting a simple and intuitive working interface/API to the user). Towards that goal, let’s encapsulate the process you followed above into a single function. Figure 5-18 shows the idea.
Figure 5-18

Encapsulate the core components in a single function. The user supplies a directory name and gets back a trained model

When you are designing a high-level API, you should aim for more generalization than what is required for a particular demo. With that in mind, you can think of providing additional arguments to this function to make it applicable to other image classification cases (you will see an example soon).

Specifically, you provide the following arguments in the function:
  • train_directory: The directory where the training images are stored in separate folders. These folders should be named as per the classes.

  • target_size: Target size for the training images. A tuple such as (200,200).

  • classes: A Python list with the classes for which you want the training to happen. This forces the generator to choose specific files from the train_directory and not look at all the data.

  • batch_size: Batch size for training

  • num_epochs: Number of epochs for training

  • num_classes: Number of output classes to consider

  • verbose: Verbosity level of the training, passed to the fit_generator method

Of course, you could have provided additional arguments corresponding to the whole model architecture or optimizer settings. This chapter is not focused on such issues, so let’s keep it compact. The full code is in the GitHub repo. Figure 5-19 shows the docstring portion to emphasize on the point of making it a flexible API.
Figure 5-19

Snapshot of the single utility function that streamlines the classification task

Testing the Utility Function

You test the train_CNN function by simply supplying a folder/directory name and getting back a trained model that can be used for predictions. Suppose that you want to train only for daisy, rose, and tulip classes and ignore the other two flowers’ data. You simply pass on a list to the classes argument. In this case, you must set the num_classes argument to 3.

You will notice how the steps per epoch are automatically reduced to 20 as the number of training samples is less than the case above. Also, note that verbose is set to 0 by default in the function above, so you need to specify explicitly verbose=1 if you want to monitor the progress of the training epoch-wise.

Basically, you can get a fully trained CNN model with two lines of code now!
# Define the folder
train_directory = "../Data//flowers-recognition/"
# Get the model
trained_model=train_CNN(train_directory=train_directory,
                        classes=['daisy','rose','tulip'],
                        num_epochs=30,
                        num_classes=3,
                        verbose=1)

Does It Work (Readily) for Another Dataset?

This is an acid test for the utility of such a function: can we just take it and apply to another dataset without much modification? Let’s find out.

A rich yet manageable image classification dataset is Caltech-101 (www.vision.caltech.edu/Image_Datasets/Caltech101/). By manageable, I mean not as large as the famous ImageNet (www.image-net.org/about.php) database, which requires massive hardware infrastructure to train (and is therefore out of bounds for testing ideas quickly on your laptop), yet diverse enough for practicing and learning the tricks of convolutional neural networks. It is an image dataset of diverse types of objects belonging to 101 categories. There are 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels. Some categories are shown in Figure 5-20.
Figure 5-20

The Caltech-101 image dataset

Who built Caltech-101?

 The Caltech-101 dataset was built by none other than famous Stanford professor Dr. Fei Fei Li (https://profiles.stanford.edu/fei-fei-li) and her colleagues (Marco Andreetto and Marc Aurelio Ranzato) at Caltech in 2003 when she was a graduate student there. We can surmise, therefore, that Caltech-101 was a direct precursor for her work on ImageNet.

Download the dataset and uncompress the contents in the same Data folder as before. The directory should look like Figure 5-21.
Figure 5-21

Directory of the stored Caltech-101 images

So, you have what you want: a top-level directory with subdirectories containing training images. And then, the same two lines as before:
# Define the folder
train_directory = "../Data/101_ObjectCategories/"
# Get the model
model_caltech101 = train_CNN(train_directory=train_directory,
                              classes=['crab','cup'],
                              batch_size=4,
                              num_epochs=25,
                              num_classes=2,
                              verbose=1)

All you did is to pass on the address of this directory to the function and choose the categories of the images you want to train the model for. Let’s say you want to train the model for classification between cup and crab. You can just pass their names as a list to the classes argument as before.

Also, note that you may have to reduce the batch_size significantly for this dataset as the total number of training images will be much lower compared to the Flowers dataset, and if the batch_size is higher than the total sample, you will have steps_per_epoch equal to 0 and that will create an error during training.

Voila! The function finds the relevant images (130 of them in total) and trains the model, 4 per batch, so 33 steps per epoch. The result is shown in Figure 5-22.
Figure 5-22

Training happening with Caltech-101 images (two classes, cup and crab)

You saw how easy it was to just pass on the training images’ directory address to the function and train a CNN model with your chosen classes. But is the model any good? Let’s find out by testing it with random pictures downloaded from the Internet. Let’s say you downloaded images of crabs and cups. You do some rudimentary image processing (resizing and dimension expansion) to match the model and get the output objects, img_crab and img_cup. Then you test the model with these images.
model_caltech101.predict(img_crab)
>> array([[1., 0.]], dtype=float32)

The model predicted the class correctly for the crab test image.

And for the cup image,
model_caltech101.predict(img_cup)
>> array([[0., 1.]], dtype=float32)

You can download any random image and test the performance of your model. If not satisfied, you should train the model by changing the architecture and hyperparameters using the modularized function.

The main point, however, is that you were able to train a CNN model with just the same two lines of code for a completely different dataset than you started with. This is the power of modularizing code and building a generic API that works with a wide variety of data sources. This saves valuable time and makes the code reusable. The edifice of productive data science stands on these foundational elements.

Other Extensions

So far, inside the fit_generator you only had a train_generator object for training. But what about a validation set? It follows exactly the same concept as a train_generator. You can randomly split from your training images a validation set and set it aside in a separate directory (the same subdirectory structures as the training directory) and you should be able to pass that on to the fit_generator function.

Want to directly work with a pandas DataFrame that stores your image? No problem. There is a method called flow_from_dataframe for the ImageDataGenerator class where you can pass on the names of the image files as contained in a pandas DataFrame and the training can proceed.

You are strongly encouraged to check out and extend these ideas as you see fit for your applications.

Activation Maps in a Few Lines of Code

DL models use millions of parameters and create extremely complex and highly nonlinear internal representations of the images or datasets that are fed to these models. They are, therefore, often called the perfect black-box ML techniques (www.wired.com/story/inside-black-box-of-neural-network/) (Figure 5-23). We can get highly accurate predictions from them after we train them with large datasets, but we have little hope of understanding the internal features and representations (www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-of-ai/) of the data that a model uses to classify a particular image into a category. In short, the black-box problem of deep learning is a powerful predictive power without an intuitive and easy-to-follow explanation.

This does not bode well because we humans are visual creatures (www.seyens.com/humans-are-visual-creatures/). Millions of years of evolution have gifted us an amazingly complex pair of eyes (www.relativelyinteresting.com/irreducible-complexity-intelligent-design-evolution-and-the-eye/) and an even more complex visual cortex (www.neuroscientificallychallenged.com/blog/know-your-brain-primary-visual-cortex), and we use these organs to make sense of the world. The scientific process starts with observation, and that is almost always synonymous with vision. In business, only what we can observe and measure can we control and manage effectively. Seeing/observing is how we start to make mental models (https://medium.com/personal-growth/mental-models-898f70438075) of worldly phenomena, classify objects around us, separate a friend from a foe, and so on.

Activations maps have been proposed to help visualize the inner workings of complex CNN models. Let’s talk about them.

Activation Maps

Several approaches for understanding and visualizing CNNs have been developed in the literature, partly as a response to the common criticism that the learned internal features in a CNN are not interpretable. The most straightforward visualization technique is to show the activations of the network during the forward pass.

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

Activation maps are just a visual representation of these activation numbers at various layers of the network as a given image progresses through as a result of various linear algebraic operations. One can deduce the workings of the network and design limitations from these maps. For ReLU activation-based networks, the activations usually start out looking relatively blobby and dense, but as the training progresses the activations usually become sparser and more localized. One design pitfall that can be easily caught with this visualization is that some activation maps may be all zero for many different inputs, which can indicate dead filters and can be a symptom of high learning rates.

However, visualizing these activation maps is a non-trivial task, even after you have trained your neural net well and are making predictions out of it. How do you easily visualize and show these activation maps for a reasonably complicated CNN with just a few lines of code?

Activation Maps with a Few Lines of Code

In the previous section, I showed how to write a single compact function to obtain a fully trained CNN model by reading image files one by one automatically from the disk. Now you’ll you use this function along with a nice little library called Keract, which makes the visualization of activation maps very easy. It is a high-level accessory library of Keras to show useful heatmaps and activation maps on various layers of a neural network.

Therefore, for this code, you need to use a couple of utility functions from the module you built earlier, train_CNN_keras and preprocess_image, to make a random RGB image compatible for generating the activation maps.

You’ll use the same Caltech-101 dataset discussed in the last section. However, you are training only with five categories of images: crab, cup, brain, camera, and chair.

Training

Training is done with a few lines of code only:
train_directory = "../Data/101_ObjectCategories/"
target_size=(512,512)
batch_size=4
classes = ['crab','cup','brain','camera','chair']
num_classes = len(classes)
num_epochs=10
model = train_CNN_keras(train_directory=train_directory,
                        num_epochs=num_epochs,
                        target_size=target_size,
                        classes = classes,
                        batch_size=batch_size,
                        num_classes=num_classes)

To generate the activations, you can choose a random image of a human brain from the Internet or any other source. Store the test image as the file brain-1.jpg.

Activation

Another couple of lines of code generate the activation:
from keract import get_activations
# The image path
img_path = '../images/brain-1.jpg'
# Preprocessing the image for the model
img = preprocess_image(img_path=img_path,
                     model=model,
                     resize=target_size)
# Generate the activations
activations = get_activations(model, img)
You get back a dictionary with layer names as the keys and NumPy arrays as the values corresponding to the activations. Figure 5-24 shows where the activation arrays have varying lengths corresponding to the size of the filter maps of that particular convolutional layer.
Figure 5-24

Activation map arrays are stored (the variable length corresponding to the size of the convolutional filter at that layer)

Thereafter, two lines of code for displaying the activation maps:
from keract import display_activations
display_activations(activations, save=False)
You get to see activation maps layer by layer. Figure 5-25 shows first convolutional layer (the 16 images corresponding to the 16 filters). Your actual image may look different based on what you use as the test image, but the idea of activation layers visualization is clearly demonstrated.
Figure 5-25

Activation maps for the first convolution layer

Figure 5-26 shows layer number 2 (the 32 images corresponding to the 32 filters).
Figure 5-26

Activation maps for the second convolution layer

For this model, there are 5 convolutional layers (followed by max pooling layers), so you get back 10 sets of images. For brevity, I won’t show the rest, but you are encouraged to explore and see them by playing with the Jupyter notebook.

Another Library for Web-Based UI

Another beautiful library for activation visualization is called Quiver. However, this one is built on the Python microserver framework Flask and displays the activation maps on a browser port rather than inside your Jupyter Notebook. It also needs a fully trained Keras model as input. So, you can easily use the utility function described in the previous section and try this library for interactive visualization of activation maps.

How Is This Productive Data Science?

In this chapter, you learned how by using only a few lines of code (utilizing compact functions from a special module and a nice little accessory library to Keras) you can train a CNN, generate activation maps, and display them layer by layer—from scratch. This gives you the ability to train CNN models (simple to complex) from any image dataset (as long as you can arrange them in a simple directory format) and look inside their guts for any test image you want.

And once you build the necessary utility modules and the activation map scripts, you can reuse and apply them to a wide variety of image data. This leads to a fast and efficient exploration of a large set of images for all kinds of applications. This is why this kind of approach integrates with the story of productive and efficient data science.

Hyperparameter Search with Scikit-learn

Keras is one of the most popular go-to Python libraries/APIs for beginners and professionals in deep learning. Although it started as a stand-alone project by François Chollet, it has been integrated natively into TensorFlow starting in version 2.0. Read more about it here (https://keras.io/about/):. As per its own official doc, it is “an API designed for human beings, not machines” as it “follows best practices for reducing cognitive load.

Now, hyperparameter tuning is one of the situations where the cognitive load is sure to increase. DL models have a great many hyperparameters to begin with: learning rate, decay rate, activation function, dropout rate, momentum, batch size, and more. Optimizing a DL model for best performance and computing cost depends critically on the right choice of these hyperparameters. Therefore, data scientists spend a lot of time and effort tuning them manually or via some automated script or optimization strategy/framework.

Although there are many supporting libraries and frameworks for handling it, for simple grid searches, Keras offers a beautiful API to integrate with our favorite scikit-learn library. In this section, we will talk about it.

Scikit-learn Enmeshes with Keras

Almost every Python machine-learning practitioner is intimately familiar with the scikit-learn library and its beautiful API with simple methods (www.tutorialspoint.com/scikit_learn/scikit_learn_estimator_api.htm) like fit, get_params, and predict. The library also offers extremely useful methods for cross-validation, model selection, pipelining, and grid search abilities. Data scientists use these tools for classical ML problems every day. But can you use the same APIs for a deep learning problem?

It turns out that Keras offer a couple of special wrapper classes, both for regression and classification problems, to utilize the full power of these APIs that are native to scikit-learn. In this section, you will work using a simple k-fold cross-validation (https://medium.com/datadriveninvestor/k-fold-cross-validation-6b8518070833) and exhaustive grid search with a Keras classifier (www.tensorflow.org/api_docs/python/tf/keras/wrappers/scikit_learn/KerasClassifier) model. It utilizes an implementation of the scikit-learn classifier API for Keras.

Data and (Preliminary) Keras Model

First, you create a simple function to synthesize and compile a Keras model with some tunable arguments built in:
from tf.keras.models import Sequential
from tf.keras.layers import Dense
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=8, activation='relu'))
    model.add(Dense(15, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])
    return model

You tackle a simple binary classification task using the popular Pima Indians Diabetes dataset (www.kaggle.com/uciml/pima-indians-diabetes-database). This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases (www.niddk.nih.gov/). The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.

You do some minimal data preprocessing including scaling the feature data with MinMaxScaler from scikit-learn. You can pass this X_scaled vector to the special wrapper class you will create.

The KerasClassifier Class

This is the special wrapper class from Keras that enmeshes the scikit-learn classifier API with Keras parametric models. You can pass on various model parameters corresponding to the create_model function, and other hyperparameters like epochs and batch size to this class. Here is the code:
from tf.keras.wrappers.scikit_learn import KerasClassifier
model = KerasClassifier(build_fn=create_model,
                        epochs=10,
                        batch_size=32,
                        verbose=0)

Note how you pass on your model creation function as the build_fn argument. This is an example of using a function as a first-class object in Python (https://dbader.org/blog/python-first-class-functions) where you can pass on functions as regular parameters to other classes or functions.

For now, you have fixed the batch size and the number of epochs you want to run your model for because you just want to run cross-validation on this model. Later, you will treat them as hyperparameters and do a full grid search over them to find the best combination.

Cross-Validation with the Scikit-learn API

Here is the code to build a 10-fold cross-validation sweep with the Keras model. First, you must import the estimators from the model_selection module of scikit-learn. Thereafter, you can simply run the model with this code, where you pass on the KerasClassifier object you built earlier along with the feature and target vectors. The important parameter here is the cv where you pass the kfold object. This tells the cross_val_score estimator to run the Keras model with the data provided, in a 10-fold stratified cross-validation setting.
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
num_folds = 10
kfold = StratifiedKFold(n_splits=num_folds,
                        shuffle=True)
cv_results = cross_val_score(model,
                          X_scaled, Y,
                          cv=kfold,
                          verbose=2)

The output variable cv_results is a NumPy array consisting of all of the accuracy scores. Accuracy is the metric you coded in your model compiling process. Obviously, you could have chosen any other classification metric like precision or recall, and in that case, that metric would have been calculated and stored in the cv_results array.

You can easily calculate the average and standard deviation of the 10-fold CV run to estimate the stability of the model predictions. This is one of the primary utilities of a cross-validation run and now you can gauge the stability of any Keras model using this approach.

Grid Search with a Updated Model

In this example, you will search over the following hyperparameters:
  • Activation function

  • Optimizer type

  • Initialization method

  • Batch size

  • Number of epochs

However, for this to work, you must integrate the first three of these parameters into your model definition code:
def create_model_grid(activation = 'relu',
                      optimizer='rmsprop',
                      init='glorot_uniform'):
    # create model
    model = Sequential()
    if activation=='relu':
        model.add(Dense(12, input_dim=8,
                        kernel_initializer=init, activation='relu'))
        model.add(Dense(8, kernel_initializer=init, activation='relu'))
    if activation=='tanh':
        model.add(Dense(12, input_dim=8,
                        kernel_initializer=init, activation='tanh'))
        model.add(Dense(8, kernel_initializer=init, activation='tanh'))
    if activation=='sigmoid':
        model.add(Dense(12, input_dim=8,
                        kernel_initializer=init, activation='sigmoid'))
        model.add(Dense(8, kernel_initializer=init, activation='sigmoid'))
    model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy',
                  optimizer=optimizer,
                  metrics=['accuracy'])
    return model
Then, you create the same KerasClassifier object as before but call it model_grid:
model_grid = KerasClassifier(build_fn=create_model_grid, verbose=0)

Make the exhaustive hyperparameter search space size as 3 × 3 × 3 × 3 × 3 = 243. Note that the actual number of Keras runs will also depend on the number of cross-validation you choose, as cross-validation will be used for each of these combinations. In total, there will be 729 fittings of the model, 3 cross-validation runs for each of the 243 parametric combinations. If you don’t like the full grid search, you can always try a randomized grid search.

Figure 5-27 shows the choices for this exhaustive grid search.
Figure 5-27

Exhaustive grid search options

You must create a dictionary of search parameters and pass it on to the scikit-learn GridSearchCV estimator:
from sklearn.model_selection import GridSearchCV
param_grid = dict(activation =  activations,
                  optimizer = optimizers,
                  epochs = epochs,
                  batch_size = batches,
                  init = initializers)
grid = GridSearchCV(estimator = model_grid,
                    param_grid = param_grid,
                    cv = 3,
                    verbose = 2,)

You set the cv = 3 to reduce the time for the run. By default, it will be set to 5 by scikit-learn if you leave out that argument.

What verbosity levels to choose?

 It is advisable to set the verbosity of GridSearchCV to 2 to keep visual track of what’s going on. Remember to keep verbose=0 for the main KerasClassifier class, though, as you probably don't want to display all the gory details of training individual epochs.

After this, just fit with the scaled feature data and labels!
grid_result = grid.fit(X_scaled, Y)
How does the result look? It is just as expected from a standard scikit-learn estimator, with all the parameters internally stored for exploration (Figure 5-28).
Figure 5-28

Fitted grid search estimator with all the parameters

You can find out the best combination with the best_score_ and best_params_ attributes from the fitted estimator. A snapshot is shown in Figure 5-29.
Figure 5-29

Snapshot of best hyperparameter choice printed

You did the initial 10-fold cross-validation using ReLU activation and Adam optimizer and got an average accuracy of 0.691. After doing an exhaustive grid search, you discover that a tanh activation and a rmsprop optimizer could have been better choices for this problem.

It is also quite straightforward to create a pandas DataFrame from the grid search results and analyze them further. You include the mean and standard dev scores in this table.
import pandas as pd
params = grid_result.cv_results_['params']
d = pd.DataFrame(params)
d['Mean'] = grid_result.cv_results_['mean_test_score']
d['Std. Dev'] = grid_result.cv_results_['std_test_score']
The DataFrame looks like Figure 5-30.
Figure 5-30

DataFrame created from the grid search parameters

You can create targeted visualizations from this dataset to examine which hyperparameters improve the performance and reduce the variation in the accuracy metric. Figure 5-31 shows some examples.
Figure 5-31

Visualizations of the grid search results

Summary

This chapter covered a variety of topics centered on the idea of making commonly used deep learning code and tasks more productive and efficient. I carried over the idea of modularizing the code from the previous chapter and showed hands-on examples with useful model building and plot functions with the Keras framework. A powerful construct called the Keras callback was also discussed in this context.

Next, I discussed the idea of streamlining one of the most common DL tasks that a data scientist can encounter: image classification. The goal was to arrive at a single utility function that presents a very simple API to the user. You just pass on a folder name to this function, and it will return a fully trained conv net model by processing all the images in that folder. Not only did you build this function step by step, but you also demonstrated the utility of such an API by applying it to a completely different dataset.

In the next section, you further utilized this function and integrated it with a special library that can extract and visualize activation maps for the various convolution layers of the DL model. Basically, you demonstrated how to visualize the inner workings of a complex DL model with only a few lines of code. Together, these two sections embodied the true journey towards productive and efficient data science involving deep learning.

Finally, you explored the topic of making hyperparameter search easy and seamless. Although there are many dedicated libraries and frameworks for this task, you saw a simple and intuitive approach using the grid search tool from scikit-learn and some special wrapper classes from Keras. It also demonstrated how two of the most popular ML libraries, Keras and scikit-learn, can work together in a seamless manner.

Making deep learning code and products fast and efficient is a huge topic by itself. There are countless approaches and research directions focusing on this. This chapter only aims to induce some fundamental ideas so that you can explore them further.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.137.169