Chapter 11. Transfer Learning

“Learn from the mistakes of others. You can’t live long enough to make them all yourself.”

—Eleanor Roosevelt


It can be challenging to have an extensive collection of data, battle-tested model structure, and processing power. Wouldn’t it be nice to cut a corner? That nifty trick in Chapter 7 where you could use Teachable Machine to transfer the qualities of a trained model to a novel one was pretty useful. In fact, this is a common trick in the machine learning world. While Teachable Machine hid the specifics and offered you only a single model, you can understand the mechanics of this trick and use it on all kinds of cool tasks. In this chapter, we will reveal the magic behind this process. While we’ll be focused on the example of MobileNet for simplicity, this can be applied to all kinds of models.

Transfer learning is the act of taking a trained model and repurposing it for a second related task.

There are a few repeatable benefits to using transfer learning for your machine learning solution. Most projects utilize some amount of transfer learning for these reasons:

  • Reutilizing a battle-tested model structure

  • Getting a solution faster

  • Getting a solution via less data

In this chapter, you’ll learn several strategies for transfer learning. You will focus on MobileNet as a fundamental example that can be reused to identify a myriad of new classes in various ways.

We will:

  • Review how transfer learning works

  • See how to reuse feature vectors

  • Cut into Layers models and reconstruct new models

  • Learn about KNN and deferred classification

When you finish this chapter, you’ll be able to take models that have been trained for a long time with lots of data and apply them to your own needs with smaller datasets.

How Does Transfer Learning Work?

How does a model that has been trained on different data suddenly work well for your new data? It sounds miraculous, but it happens in humans every day.

You’ve spent years identifying animals, and you’ve probably seen hundreds of camels, guinea pigs, and beavers from cartoons, zoos, and commercials. Now I’m going to show you an animal you’ve probably not seen often, or even at all. The animal in Figure 11-1 is called a capybara (Hydrochoerus hydrochaeris).

profile of a capybara
Figure 11-1. The capybara

For some of you, this is the first time (or one of the few times) you’ve seen a photo of a capybara. Now, take a look at the lineup in Figure 11-2. Can you find the capybara?

three mammals quiz
Figure 11-2. Which one is the capybara?

The training set of a single photo was enough for you to make a choice because you’ve been distinguishing between animals your entire life. With a novel color, angle, and photo size, your brain probably detected with absolute certainty that animal C was another capybara. The features learned by your years of experience have helped you make an educated decision. In that same way, powerful models that have significant experience can be taught to learn new things from small amounts of new data.

Transfer Learning Neural Networks

Let’s bring things back to MobileNet for a moment. The MobileNet model was trained to identify features that distinguish a thousand items from each other. That means there are convolutions to detect fur, metal, round things, ears, and all kinds of crucial differential features. All these features are chewed up and simplified before they are flattened into a neural network, where the combination of various features creates a classification.

The MobileNet model can identify different breeds of dogs, and even distinguish a Maltese terrier from a Tibetan terrier. If you were to make a “dog or cat” classifier, it makes sense that a majority of those advanced features would be reusable in your simpler model.

The previously learned convolutional filters would be extremely useful in identifying key features for brand-new classifications, like our capybara example in Figure 11-2. The trick is to take the feature identification portion of the model and apply your own neural network to the convolutional output, as illustrated in Figure 11-3.

changing the NN flowchart
Figure 11-3. CNN transfer learning

So how do you separate and recombine these sections of previously trained models? You’ve got lots of options. Again, we’ll learn a bit more about Graph and Layers models.

Easy MobileNet Transfer Learning

Fortunately, TensorFlow Hub already has a MobileNet model that is disconnected from any neural network. It offers half a model for you to use for transfer learning. Half a model means it hasn’t been tied down into a final softmax layer for what it’s meant to classify. This allows us to let MobileNet derive the features of an image and then provide us with tensors that we can then pass to our own trained network for classification.

TFHub calls these models image feature vector models. You refine your search to show only these models or identify them by looking at the problem domain tags, as illustrated in Figure 11-4.

screenshot of proper tags
Figure 11-4. Problem domain tags for image feature vectors

You might notice small variations of MobileNet and wonder what the differences are. Once you learn a few sneaky terms, each of these model descriptions becomes quite readable.

For instance, we’ll use Example 11-1.

Example 11-1. One of the image feature vector models
imagenet/mobilenet_v2_130_224/feature_vector
imagenet

This model was trained on the ImageNet dataset.

mobilenet_v2

The model’s architecture is MobileNet v2.

130

The model’s depth multiplier was 1.30. This results in more features. If you want to speed things up, you can choose “05,” which would have less than half the feature output with a boost in speed. This is a fine-tuning option when you’re ready to modify speed versus depth.

224

The model’s expected input size is 224 x 224 images.

feature_vector

We already know from the tag, but this model outputs tensors meant to be features of the image for a second model to interpret.

Now that we have a trained model that can identify features in an image, we will run our training data through the MobileNet image feature vector model and then train a model on the output from that. In other words, the training images will turn into a feature vector, and we’ll train a model to interpret that feature vector.

The benefit of this strategy is that it’s straightforward to implement. The major drawback is that you’ll have to load two models when you’re ready to use the newly trained model (one to generate features and one to interpret them). Creatively, there might be some cases where it’s quite useful to “featurize” an image and then run that through multiple neural networks. Regardless, let’s see it in action.

TensorFlow Hub Check, Mate!

We’re going to use transfer learning with MobileNet to identify chess pieces like the one shown in Figure 11-5.

image of a chess knight on a table
Figure 11-5. Simple chess pieces classifier

You’ll only have a few images of each chess piece. That’s not normally enough, but with the magic of transfer learning, you’ll get an efficient model.

Loading chess images

For this exercise, I’ve compiled a collection of 150 images and loaded them into a CSV file for quick use. This isn’t something I’d recommend doing in most cases because it’s inefficient for processing and disk space, but it serves as a simple vector for some quick in-browser training. The code to load these images is now trivial.

Note

You can access the chess images and the code that converted them into a CSV file in the chapter11/extra/node-make-csvs folder.

The files chess_labels.csv and chess_images.csv can be found in the chess_data.zip file in code associated with this lesson. Unzip this file and use Danfo.js to load the contents.

Many browsers may have issues with concurrently reading all 150 images, so I’ve limited the demo to process only 130 images. Working against concurrent data limitations is a common issue with machine learning.

Note

Once the image has been featurized, it takes up a lot less space. Feel free to experiment with creating features in batches, but that’s outside the scope of this chapter.

The images are already 224 x 224, so you can load them with the following code:

console.log("Loading huge CSV - this will take a while");
const numImages = 130; // between 1 and 150
// Get Y values
const labels = await dfd.read_csv("chess_labels.csv", numImages); 1
const Y = labels.tensor; 2
// Get X values (Chess images)
const chessImages = await dfd.read_csv("chess_images.csv", numImages);
const chessTensor = chessImages.tensor.reshape([
  labels.shape[0], 224, 224, 3, 3
]);
console.log("Finished loading CSVs", chessTensor.shape, Y.shape);
1

The second parameter to read_csv limits the row count to the specified number.

2

The DataFrames are then converted to tensors.

3

The images were flattened to become serialized but are now reshaped into a rank-four batch of RGB images.

After a bit of time, this code prints out the X and Y shapes of 130 ready-to-go images and encodings:

Finished loading CSVs (4) [130, 224, 224, 3] (2) [130, 6]

If your computer is unable to handle the 130 images, you can lower the numImages variable and still play along. However, the load time for the CSV file is always constant because the entire file must be processed.

Tip

Images like chess pieces are perfect for image augmentation because skewing chess pieces would never cause one piece to be confused with another. If you ever need more images, you can mirror the entire set to effectively double your data. Entire libraries exist to mirror, tilt, and skew images so you can create more data.

Loading the feature model

You can load the feature model just like you’d load any model from TensorFlow Hub. You can pass the code through the model for prediction, and it will result in numImages predictions. The code looks like Example 11-2.

Example 11-2. Loading and using the feature vector model
// Load feature model
const tfhubURL =
  "https://oreil.ly/P2t2k";
const featureModel = await tf.loadGraphModel(tfhubURL, {
  fromTFHub: true,
});
const featureX = featureModel.predict(chessTensor);
// Push data through feature detection
console.log(`Features stack ${featureX.shape}`);

The output of the console log is

Features stack 130,1664

Each of the 130 images has become a set of 1,664 floating-point values that are sensitive to features of the image. If you change the model to use a different depth, the number of features will change. The number 1,664 is unique to the 1.30 depth version of MobileNet.

As previously mentioned, the 1,664 Float32 feature set is significantly smaller than the 224*224*3 = 150,528 Float32 input of each image. This will speed up training and be kinder to your computer memory.

Creating your neural network

Now that you have a collection of features, you can create a new and utterly untrained model that fits those 1,664 features to your labels.

Example 11-3. A small 64-layer model with a final layer of 6
// Create NN
const transferModel = tf.sequential({
  layers: [                              1
    tf.layers.dense({
      inputShape: [featureX.shape[1]],   2
      units: 64,
      activation: "relu",
    }),
    tf.layers.dense({ units: 6, activation: "softmax" }),
  ],
});
1

This Layers model is using a slightly different syntax than you’re used to. Rather than calling .add, all the layers are being presented in an array of the initial configuration. This syntax is nice for a small model like this.

2

The inputShape of the model is set to 1,664 dynamically, in the case that you’d like to change the model’s depth multiplier by updating the model URL.

Training results

Nothing is new in the training code. The model trains based on the feature output. Because the feature output is so small compared to the original image tensor, the training happens extremely quickly.

transferModel.compile({
  optimizer: "adam",
  loss: "categoricalCrossentropy",
  metrics: ["accuracy"],
});

await transferModel.fit(featureX, Y, {
  validationSplit: 0.2,
  epochs: 20,
  callbacks: { onEpochEnd: console.log },
});

Within a few epochs, the model has outstanding accuracy. Take a look at Figure 11-6.

transfer learning results
Figure 11-6. From 50% to 96% validation accuracy in 20 epochs

Transfer learning using an existing model on TensorFlow Hub relieves you of architectural headaches and rewards you with high accuracy. But it’s not the only way you can implement transfer learning.

Utilizing Layers Models for Transfer Learning

There are some obvious and not-so-obvious limitations to the previous method. First, the feature model cannot be trained. All your training was on a new model that consumed the features of the Graph model, but the convolutional layers and size were fixed. You have small variations of the convolutional network model available but no way to update or fine-tune it.

The previous model from TensorFlow Hub was a Graph model. The Graph model was optimized for speed and, as you know, cannot be modified or trained. On the other side, Layers models are primed for modification, so you can rewire them for transfer learning.

Also, in the previous example, you were essentially dealing with two models every time you would need to classify an image. You would have to load two JSON models and run your image through the feature model and then the new model to categorize your images. It’s not the end of the world, but a single model is possible via combining Layers models.

Let’s solve the same chess problem again, but with a Layers version of MobileNet so we can inspect the difference.

Shaving Layers on MobileNet

For this exercise, you will use a version of MobileNet v1.0 that is set up for being a Layers model. This is the model Teachable Machine uses, and while it’s sufficient for small exploratory projects, you’ll notice it’s not as accurate as the MobileNet v2 with 1.30 depth. You’re already well versed in converting models with the wizard, as you learned in Chapter 7, so you can create a larger, newer Layers model when needed. Accuracy is an important metric, but it’s far from the only metric you should evaluate when shopping for a transfer model.

MobileNet has a vast collection of layers, and some of these are layers you’ve never seen before. Let’s take a look. Load the MobileNet model associated with this chapter and review the summary of layers with model.summary(). This prints a huge list of layers. Don’t feel overwhelmed. When you read from the bottom to the top, the last two convolutional layers with activations are called conv_preds and conv_pw_13_relu:

...

conv_pw_13 (Conv2D)          [null,7,7,256]            65536
_________________________________________________________________
conv_pw_13_bn (BatchNormaliz [null,7,7,256]            1024
_________________________________________________________________
conv_pw_13_relu (Activation) [null,7,7,256]            0
_________________________________________________________________
global_average_pooling2d_1 ( [null,256]                0
_________________________________________________________________
 reshape_1 (Reshape)          [null,1,1,256]            0
_________________________________________________________________
dropout (Dropout)            [null,1,1,256]            0
_________________________________________________________________
conv_preds (Conv2D)          [null,1,1,1000]           257000
_________________________________________________________________
act_softmax (Activation)     [null,1,1,1000]           0
_________________________________________________________________
reshape_2 (Reshape)          [null,1000]               0
=================================================================
Total params: 475544
Trainable params: 470072
Non-trainable params: 5472

The last convolution, conv_preds, serves as a flatten layer of the features to the 1,000 possible classes. This is somewhat specific to the model’s trained classes, so because of that, we’ll jump up to the second convolution (conv_pw_13_relu) and cut there.

MobileNet is a complex model, and even though you don’t have to understand all the layers to use it for transfer learning, there’s a bit of art in deciding what to remove. In simpler models, like the one for the upcoming Chapter Challenge, it’s common to keep the entire convolutional workflow and cut at the flatten layer.

You can cut to a layer by knowing its unique name. The code shown in Example 11-4 is available on GitHub.

Example 11-4.
const featureModel = await tf.loadLayersModel('mobilenet/model.json')
console.log('ORIGINAL MODEL')
featureModel.summary()
const lastLayer = featureModel.getLayer('conv_pw_13_relu')
const shavedModel = tf.model({
  inputs: featureModel.inputs,
  outputs: lastLayer.output,
})
console.log('SHAVED DOWN MODEL')
shavedModel.summary()

The code from Example 11-4 prints out two large models, but the key difference is that the second model suddenly stops at conv_pw_13_relu.

The last layer is now the one we identified. When you review the summary of the shaved-down model, it’s like a feature extractor. There is a key difference that should be noted. The final layer is a convolution, so the first layer of your constructed transfer model should flatten the convolutional input so it can be densely connected to a neural network.

Layers Feature Model

Now you can use the shaved model as a features model. This gets you the same two-model system you had from TFHub. Your second model will need to read the output of conv_pw_13_relu:

// Create NN
const transferModel = tf.sequential({
  layers: [
    tf.layers.flatten({ inputShape: featureX.shape.slice(1) }),
    tf.layers.dense({ units: 64, activation: 'relu' }),
    tf.layers.dense({ units: 6, activation: 'softmax' }),
  ],
})

We are setting the shape as defined by the intermediate features. This could also be directly tied to the shaved model’s output shape (shavedModel.outputs[0].shape.slice(1)).

From here, you’re right back to where you were in the TFHub model. The base model creates features, and the second model interprets those features.

Training with these two layers achieves around 80%+ accuracy. Keep in mind we’re using a completely different model architecture (this is MobileNet v1) and a lower depth multiplier. Getting at least 80% from this rough model is good.

A Unified Model

Just as with the feature vector model, your training only has access to a few layers and does not update the convolutional layers. Now that you’ve trained two models, you can unify their layers again into a single model. You might be wondering why you’re combining the model after training instead of before. It’s a common practice to train your new layers with your feature layers locked or “frozen” to their original weights.

Once the new layers have gotten trained up, you can generally “unfreeze” more layers and train the new and the old together. This phase is often called fine-tuning the model.

So how do you unify these two models now? The answer is surprisingly simple. Create a third sequential model and add the two models with model.add. The code looks like this:

// combine the models
const combo = tf.sequential()
combo.add(shavedModel)
combo.add(transferModel)
combo.compile({
  optimizer: 'adam',
  loss: 'categoricalCrossentropy',
  metrics: ['accuracy'],
})
combo.summary()

The new combo model can be downloaded or trained further.

If you had joined the models before training the new layers, you’d likely see your model overfit the data.

No Training Needed

It’s worth noting that there’s a witty way to use two models for transfer learning with zero training. The trick is to use a second model that identifies distances in similarity.

The second model is called K-Nearest Neighbors (KNN)1 model, and it groups a data element with K of the most similar data elements in a feature space. The idiom “birds of a feather flock together” is the premise for KNN.

In Figure 11-7, X would be identified as a bunny because the three nearest examples in features are also bunnies.

feature distance
Figure 11-7. Identify with neighbors in feature space

KNN is sometimes called instance-based learning or lazy learning because you’re moving all the necessary processing to the moment of classification of the data around it. This differed model is straightforward to update. You can always add more images and classes dynamically to define edge cases or new categories without retraining. The cost comes from the fact that the feature graph grows with each example you add, unlike the fixed space of a single trained model. The more data points you add to a KNN solution, the larger the feature set that accompanies the models will become.

Additionally, since there is no training, similarity is the only metric. This makes this system nonideal for some problems. For instance, if you were trying to train a model to see if people were wearing face masks or not, then you’re looking for a model to focus on a single feature rather than the collection of several features. Two people who are dressed the same might share more similarities and therefore be placed in the same category with KNN. For KNN to work on masks, your feature vector model would have to be face-specific, where trained models can learn differentiating patterns.

Easy KNN: Bunnies Versus Sports Cars

KNN, like MobileNet, has a JS wrapper provided by Google. We can implement KNN transfer learning quickly by hiding all the complexity details use MobileNet and KNN NPM packages to make a quick transfer learning demo.

Not only are we going to avoid running any training, but we’ll also use existing libraries to avoid any deep dive into TensorFlow.js. We’ll be doing this for a flashy demo, but if you decide to build something more robust with these models, you should probably evaluate avoiding abstract packages that you don’t control. You already understand all the inner workings of transfer learning.

To do this quick demo, you’ll import the three NPM modules:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/[email protected]/dist/tf.min.js">
</script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/[email protected]">
</script>
<script
src="https://cdn.jsdelivr.net/npm/@tensorflow-models/[email protected]">
</script>

For simplicity, the example code from this chapter has all the images on the page, so you can directly reference them. Now you can load MobileNet with mobileNet = await mobilenet.load(); and the KNN classifier with knnClassifier.create();.

The KNN classifier needs examples of each class. To simplify this process I’ve created the following helper function:

// domID is the DOM element ID
// classID is the unique class index
function addExample(domID, classID) {
  const features = mobileNet.infer( 1
    document.getElementById(domID), 2
    true                            3
  );
  classifier.addExample(features, classID);
}
1

The infer method returns values rather than the rich JavaScript object of detections.

2

The image id on the page will tell MobileNet what image to resize and process. The tensor logic is hidden by JavaScript, but many chapters in this book have explained what is actually happening.

3

The MobileNet model returns the features (sometimes called embeddings) of the image. If this is not set, then the tensor of 1,000 raw values is returned (sometimes called logits).

Now you can add examples of each class with this helper method. You just name the image element’s unique DOM ID and what class it should be associated with. Adding three examples of each is as simple as this:

// Add examples of two classes
addExample('bunny1', 0)
addExample('bunny2', 0)
addExample('bunny3', 0)
addExample('sport1', 1)
addExample('sport2', 1)
addExample('sport3', 1)

Lastly, it’s the same system to predict. Get the features of an image, and ask the classifier to identify which class it believes the input is based on KNN.

// Moment of truth
const testImage = document.getElementById('test')
const testFeature = mobileNet.infer(testImage, true);
const predicted = await classifier.predictClass(testFeature)
if (predicted.classIndex === 0) { 1
  document.getElementById("result").innerText = "A Bunny" 2
} else {
  document.getElementById("result").innerText = "A Sports Car"
}
1

The classIndex is the number as passed in addExample. If a third class is added, that new index would be a possible output.

2

The web page text is changed from “???” to the result.

The result is that the AI can identify the correct class for a new image by comparing against six examples, as shown in Figure 11-8.

screenshot of AI page
Figure 11-8. With only three images of each class, the KNN model predicts correctly

You can dynamically add more and more classes. KNN is an exciting and expandable way to utilize the experience of advanced models through transfer learning.

Chapter Review

Because this chapter has explained the mystery of transfer learning with MobileNet, you now have the ability to apply this power-up to any preexisting model you can somewhat comprehend. Perhaps you want to adjust the pet’s faces model to find cartoon or human faces. You don’t have to start from scratch!

Transfer learning adds a new utility to your toolbelt of AI. Now when you find a new model in the wild, you can ask yourself how you could use it directly and how you can use it in transfer learning for something similar.

Chapter Challenge: Warp-Speed Learning

The Hogwarts sorting model from the previous chapter has thousands of black-and-white drawing images of experience in the convolutional layers. Unfortunately, those thousands of images were limited to animals and skulls. They all have nothing to do with Star Trek. Don’t fret; with only 50 or so new images, you can re-train the model from the previous chapter to identify the three Star Trek symbols shown in Figure 11-9.

Perfect validation accuracy in a few epochs
Figure 11-9. Star Trek symbols

Set phasers to fun and use the methods you learned in this chapter to take the Layers model you trained in Chapter 10 (or download the trained one from the associated book source code), and train a new model that can identify these images from a mere few examples.

The new training image data can be found in CSV form in the associated book source code. The training image data has been put in a CSV so you can easily import it with Danfo.js. The files are images.csv and labels.csv.

You can find the answer to this challenge in Appendix B.

Review Questions

Let’s review the lessons you’ve learned from the code you’ve written in this chapter. Take a moment to answer the following questions:

  1. What does KNN stand for?

  2. Whenever you have a small training set, there’s a danger of what?

  3. When you’re looking for the convolutional half of a CNN model on TensorFlow Hub, what tag are you looking for?

  4. Which depth multiplier will have a more extensive feature output, 0.50 or 1.00?

  5. What method can you call on the MobileNet NPM module to gather the feature embeddings of an image?

  6. Should you combine your transfer model parts and then train, or train and then combine your models?

  7. When you cut a model at the convolutional layer, what do you have to do before importing that information to a neural network’s dense layers?

Solutions to these exercises are available in Appendix A.

1 KNN was developed by Evelyn Fix and Joseph Hodges in 1951.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.41.214