Preprocessing the data

At this stage, we have the app rendering the frames from the camera, but we are not yet receiving any frames. To do this, we will assign ourselves to receive these frames, as implemented in the previous section. The existing ViewController class already has an extension implementing the VideoCaptureDelegate protocol. What's left to do is to assign ourselves as the delegate of the VideoCapture instance and implement the details of the callback method; the following is the code for extension:

extension ViewController : VideoCaptureDelegate{
    func onFrameCaptured(videoCapture: VideoCapture,
     pixelBuffer:CVPixelBuffer?,
     timestamp:CMTime){
     }
 }

Depending on your coding style, you can just as easily implement the protocols inside the main class. I tend to make use of extensions to implement the protocols—a personal preference.

First, let's assign ourselves as the delegate to start receiving the frames; within the ViewDidLoad method of the ViewController class, we add the following statement just before we initialize the camera:

self.videoCapture.delegate = self

Now that we have assigned ourselves as the delegate, we will receive frames (at the defined frame rate) via the callback:

func onFrameCaptured(videoCapture: VideoCapture,
 pixelBuffer:CVPixelBuffer?,
 timestamp:CMTime){
 // TODO
 }

It's within this method that we will prepare and feed the data to the model to classify the dominant object within the frame. What the model is expecting is dependent on the model, so to get a better idea of what we need to pass it, let's download the trained model we will be using for this example and import it into our project.

Trained models can be obtained from a variety of sources; in some instances, you will need to convert them, and in other cases, you will need to train the model yourself. But in this instance, we can make use of the models Apple has made available; open up your web browser and navigate to https://developer.apple.com/machine-learning/:

You will be taken to a web page where Apple has made available a range of pretrained and converted models. Conveniently, most of the available models are specifically for object classification; given our use case, we're particularly interested in the models trained on a large array of objects. Our options include MobileNet, SqueezeNet, ResNet50, Inception v3, and VGG16. Most of these have been trained on the ImageNet dataset, a dataset with reference to over 10 million URLs' images that have been manually assigned to one of 1,000 classes. References to the original research papers and performance can be obtained via the View original model details link. For this example, we'll use Inception v3, a good balance between size and accuracy.

Here, we are using the Inception v3 model, but the effort to swap the model is minimal; it requires updating the references as the generated classes are prefixed with the model's name, as you will soon see, and ensuring that you are conforming to the expected inputs of the model (which can be alleviated by using the Vision framework, as you will see in future chapters).

Click on the Download Core ML Model link to proceed to download and, once downloaded, drag the Inceptionv3.mlmodel file onto the Project Navigator panel on the left of Xcode, checking Copy items if needed if desired or else leaving everything as default. Select the Inceptionv3.mlmodel file from the Project Navigator panel on the left to bring up the details within the Editor area, as shown in the following screenshot:

It is important to ensure that the model is correctly assigned to the appropriate target; in this example, this means verifying that the ObjectRecognition target is checked, as seen here on the Utilities panel to the right. Also worth noting are the expected inputs and outputs of the model. Here, the model is expecting a color image of size 299 x 299 for its input, and it returns a single class label as a string and a dictionary of string-double pairs of probabilities of all the classes.

When a .mlmodel file is imported, Xcode will generate a wrapper for the model itself and the input and output parameters to interface with the model; this is illustrated here:

You can easily access this by tapping on the arrow button next to the Inceptionv3 label within the Model Class section; when tapped, you will see the following code (separated into three distinct blocks to make it more legible):

@available(macOS 10.13, iOS 11.0, tvOS 11.0, watchOS 4.0, *)
 class Inceptionv3Input : MLFeatureProvider {
 
 /// Input image to be classified as color (kCVPixelFormatType_32BGRA) image buffer, 299 pixels wide by 299     pixels high
 var image: CVPixelBuffer
 
 var featureNames: Set<String> {
     get {
         return ["image"]
     }
 }
 
 func featureValue(for featureName: String) -> MLFeatureValue? {
     if (featureName == "image") {
         return MLFeatureValue(pixelBuffer: image)
     }
     return nil
 }
 
 init(image: CVPixelBuffer) {
     self.image = image
     }
 }

The first block of the preceding code is the input for our model. This class implements the MLFeatureProvider protocol, a protocol representing a collection of feature values for the model, in this case, the image feature. Here, you can see the expected data structure, CVPixelBuffer, along with the specifics declared (handily) in the comments. Let's continue on with our inspection of the generated classes by looking at the binding for the output:

@available(macOS 10.13, iOS 11.0, tvOS 11.0, watchOS 4.0, *)
 class Inceptionv3Output : MLFeatureProvider {
 
 /// Probability of each category as dictionary of strings to doubles
 let classLabelProbs: [String : Double]
 
 /// Most likely image category as string value
 let classLabel: String
 
 var featureNames: Set<String> {
     get {
         return ["classLabelProbs", "classLabel"]
     }
 }
 
 func featureValue(for featureName: String) -> MLFeatureValue? {
     if (featureName == "classLabelProbs") {
         return try! MLFeatureValue(dictionary: classLabelProbs as [NSObject : NSNumber])
     }
     if (featureName == "classLabel") {
         return MLFeatureValue(string: classLabel)
     }
     return nil
}
 
 init(classLabelProbs: [String : Double], classLabel: String) {
     self.classLabelProbs = classLabelProbs
     self.classLabel = classLabel
     }
 }

As previously mentioned, the output exposes a directory of probabilities and a string for the dominated class, each exposed as properties or accessible using the getter method featureValue(for featureName: String) by passing in the feature's name. Our final extract for the generated code is the model itself; let's inspect that now:

@available(macOS 10.13, iOS 11.0, tvOS 11.0, watchOS 4.0, *)
 class Inceptionv3 {
 var model: MLModel
 
 /**
 Construct a model with explicit path to mlmodel file
 - parameters:
 - url: the file url of the model
 - throws: an NSError object that describes the problem
 */
 init(contentsOf url: URL) throws {
 self.model = try MLModel(contentsOf: url)
 }
 
 /// Construct a model that automatically loads the model from the app's bundle
 convenience init() {
 let bundle = Bundle(for: Inceptionv3.self)
 let assetPath = bundle.url(forResource: "Inceptionv3", withExtension:"mlmodelc")
 try! self.init(contentsOf: assetPath!)
 }
 
 /**
 Make a prediction using the structured interface
 - parameters:
 - input: the input to the prediction as Inceptionv3Input
 - throws: an NSError object that describes the problem
 - returns: the result of the prediction as Inceptionv3Output
 */
 func prediction(input: Inceptionv3Input) throws -> Inceptionv3Output {
 let outFeatures = try model.prediction(from: input)
 let result = Inceptionv3Output(classLabelProbs: outFeatures.featureValue(for: "classLabelProbs")!.dictionaryValue as! [String : Double], classLabel: outFeatures.featureValue(for: "classLabel")!.stringValue)
 return result
 }
 
 /**
 Make a prediction using the convenience interface
 - parameters:
 - image: Input image to be classified as color (kCVPixelFormatType_32BGRA) image buffer, 299 pixels wide by 299 pixels high
 - throws: an NSError object that describes the problem
 - returns: the result of the prediction as Inceptionv3Output
 */
 func prediction(image: CVPixelBuffer) throws -> Inceptionv3Output {
 let input_ = Inceptionv3Input(image: image)
 return try self.prediction(input: input_)
 }
 }

This class wraps the model class and provides strongly typed methods for performing inference via the prediction(input: Inceptionv3Input) and prediction(image: CVPixelBuffer) methods, each returning the output class we saw previously—Inceptionv3Output. Now, knowing what our model is expecting, let's continue to implement the preprocessing functionality required for the captured frames in order to feed them into the model.

Core ML 2 introduced a the ability to work with batches; if your model was compiled with Xcode 10+ then you will also see the additional method <CODE>func predictions(from: MLBatchProvider, options: MLPredictionOptions)</CODE> allowing you to perform inference on a batch of inputs.

At this stage, we know that we are receiving the correct data type (CVPixelBuffer) and image format (explicitly defined in the settings when configuring the capture video output instance kCVPixelFormatType_32BGRA) from the camera. But we are receiving an image significantly larger than the expected size of 299 x 299. Our next task will be to create some utility methods to perform resizing and cropping.

For this, we will be extending CIImage to wrap and process the pixel data we receive along with making use of CIContext to obtain the raw pixels again. If you're unfamiliar with the CoreImage framework, then it suffices to say that it is a framework dedicated to efficiently processing and analyzing images. CIImage can be considered the base data object of this framework that is often used in conjunction with other CoreImage classes such as CIFilter, CIContext, CIVector, and CIColor. Here, we are interested in CIImage as it provides convenient methods for manipulating images along with CIContext to extract the raw pixel data from CIImage (CVPixelBuffer).

Back in Xcode, select the CIImage.swift file from the Project navigator to open it up in the Editor area. In this file, we have extended the CIImage class with a method responsible for rescaling and another for returning the raw pixels (CVPixelBuffer), a format required for our Core ML model:

extension CIImage{
 
 func resize(size: CGSize) -> CIImage {
     fatalError("Not implemented")
 }
 
 func toPixelBuffer(context:CIContext,
 size insize:CGSize? = nil,
     gray:Bool=true) -> CVPixelBuffer?{
         fatalError("Not implemented")
     }
 }

Let's start by implementing the resize method; this method is passed in the desired size, which we'll use to calculate the relative scale; then we'll use this to scale the image uniformly. Add the following code snippet to the resize method, replacing the fatalError("Not implemented") statement:

let scale = min(size.width,size.height) / min(self.extent.size.width, self.extent.size.height)
 
 let resizedImage = self.transformed(
 by: CGAffineTransform(
 scaleX: scale,
 y: scale))

Unless the image is a square, we are likely to have an overflow either vertically or horizontally. To handle this, we will simply center the image and crop it to the desired size; do this by appending the following code to the resize method (beneath the code written in the preceding snippet):

let width = resizedImage.extent.width
 let height = resizedImage.extent.height
 let xOffset = (CGFloat(width) - size.width) / 2.0
 let yOffset = (CGFloat(height) - size.height) / 2.0
 let rect = CGRect(x: xOffset,
 y: yOffset,
 width: size.width,
 height: size.height)
 
 return resizedImage
 .clamped(to: rect)
 .cropped(to: CGRect(
 x: 0, y: 0,
 width: size.width,
 height: size.height))

We now have the functionality to rescale the image; our next piece of functionality is to obtain a CVPixelBuffer from the CIImage. Let's do that by implementing the body of the toPixelBuffer method. Let's first review the method's signature and then briefly talk about the functionality required:

func toPixelBuffer(context:CIContext, gray:Bool=true) -> CVPixelBuffer?{
     fatalError("Not implemented")
 }

This method is expecting a CIContext and flag indicating whether the image should be grayscale (single channel) or full color; CIContext will be used to render the image to a pixel buffer (our CVPixelBuffer). Let's now flesh out the implementation for toPixelBuffer piece by piece.

The preprocessing required on the image (resizing, grayscaling, and normalization) is dependent on the Core ML model and the data it was trained on. You can get a sense of these parameters by inspecting the Core ML model in Xcode. If you recall, the expected input to our model is (image color 299 x 299); this tells us that the Core ML model is expecting the image to be color (three channels) and 299 x 299 in size.

We start by creating the pixel buffer we will be rendering our image to; add the following code snippet to the body of the toPixelBuffer method, replacing the fatalError("Not implemented") statement:

let attributes = [
 kCVPixelBufferCGImageCompatibilityKey:kCFBooleanTrue,
 kCVPixelBufferCGBitmapContextCompatibilityKey:kCFBooleanTrue
 ] as CFDictionary
 
 var nullablePixelBuffer: CVPixelBuffer? = nil
 let status = CVPixelBufferCreate(kCFAllocatorDefault,
 Int(self.extent.size.width),
 Int(self.extent.size.height),
 gray ? kCVPixelFormatType_OneComponent8 : kCVPixelFormatType_32ARGB,
 attributes,
 &nullablePixelBuffer)
 
 guard status == kCVReturnSuccess, let pixelBuffer = nullablePixelBuffer
 else { return nil }

We first create an array to hold the attributes defining the compatibility requirements for our pixel buffer; here, we specify that we want our pixel buffer to be compatible with CGImage types (kCVPixelBufferCGImageCompatibilityKey) and compatible with CoreGraphics bitmap contexts (kCVPixelBufferCGBitmapContextCompatibilityKey).

We then proceed to create a pixel buffer, passing in our compatibility attributes, the format (either grayscale or full color depending on the value of gray), width, height, and pointer to the variable. Next, we unwrap the nullable pixel buffer as well as ensure that the call was successful; if either of these is false, we return NULL. Otherwise, we're ready to render our CIImage into the newly created pixel buffer. Append the following code to the toPixelBuffer method:

CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
 
 context.render(self,
 to: pixelBuffer,
 bounds: CGRect(x: 0,
 y: 0,
 width: self.extent.size.width,
 height: self.extent.size.height),
 colorSpace:gray ?
 CGColorSpaceCreateDeviceGray() :
 self.colorSpace)
 
 CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
 
 return pixelBuffer

Before drawing, we lock the address of the pixel buffer via CVPixelBufferLockBaseAddress and then unlock once we've finished using the CVPixelBufferUnlockBaseAddress method. We are required to do this when accessing pixel data from the CPU, which we are doing here.

Once locked, we simply use the CIContext to render the scaled image to the buffer, passing in the destination rectangle (in this case, the full size of the pixel buffer) and destination color space, which is full color or grayscale depending on the value of gray as mentioned previously. After unlocking the pixel buffer, as described earlier, we return our newly created pixel buffer.

We have now extended the CIImage with two convenient methods, one responsible for rescaling and the other for creating a pixel buffer representation of itself. We will now return to the ViewController class to handle the preprocessing steps required before passing our data into the model. Select the ViewController.swift file from the Projector navigator panel within Xcode to bring up the source code, and within the body of the ViewController class, add the following variable:

let context = CIContext()

As previously discussed, we will be passing this to our CIImage.toPixelBuffer method for rendering the image to the pixel buffer. Now return to the onFrameCaptured method and add the following code, to make use of the methods we've just created for preprocessing:

guard let pixelBuffer = pixelBuffer else{ return }
 
 // Prepare our image for our model (resizing)
 guard let scaledPixelBuffer = CIImage(cvImageBuffer: pixelBuffer)
 .resize(size: CGSize(width: 299, height: 299))
 .toPixelBuffer(context: context) else{ return }

We first unwrap the pixelBuffer, returning if it is NULL; then we create an instance of CIImage, passing in the current frame and then chaining our extension methods to perform rescaling (299 x 299) and rendering out to a pixel buffer (setting the gray parameter to false as the model is expecting full color images). If successful, we are returned a image ready to be passed to our model for inference, the focus of the next section.

Table of Contents for Preprocessing the data

Create new playlist

Sign In

Sign Up

Table of Contents for
Preprocessing the data