Building custom layers in Swift

In this section, we will be mainly focusing on implementing the custom layers that our model is dependent on, and we'll omit a lot of the application's details by working with an existing template—a structure you have no doubt become quite familiar with.

If you haven't done so already, pull down the latest code from the accompanying repository: https://github.com/packtpublishing/machine-learning-with-core-ml. Once downloaded, navigate to the directory Chapter6/Start/StyleTransfer/ and open the project StyleTransfer.xcodeproj. Once loaded, you will see the project for this chapter:

The application consists of two view controllers. The first, CameraViewController, provides the user with a live stream of the camera and the ability to take a photo. When a photo is taken, the controller presents the other view controller, StyleTransferViewController, passing along with the captured photo. StyleTransferViewController then presents the image, along with a horizontal CollectionView at the bottom containing a set of styles that the user can select by tapping on them.

Each time the user selects a style, the controller updates the ImageProcessors style property and then calls its method, processImage, passing in the assigned image. It is here that we will implement the functionality responsible for passing the image to the model and returning the result via the assigned delegates onImageProcessorCompleted method, which is then presented to the user.

Now, with our project loaded, let's import the model we have just created; locate the downloaded .mlmodel file and drag it onto Xcode . Once imported, we select it from the left-hand panel to inspect the metadata, to remind ourselves what we need to implement:

By inspecting the model, we can see that it is expecting an input RGB image of size 320 x 320, and it will output an image with the same dimensions. We can also see that the model is expecting two custom layers named ResCropBlockLambda and RescaleOutputLambda. Before implementing these classes, let's hook the model up and, just for fun, see what happens when we try to run it without the custom layers implemented.

Select ImageProcessor.swift from the left-hand-side panel; in this project, we will get the Vision framework to do all the preprocessing. Start by adding the following properties within the body of the ImageProcessor class, somewhere such as underneath the style property:

lazy var vanCoghModel : VNCoreMLModel = {
    do{
        let model = try VNCoreMLModel(for: FastStyleTransferVanGoghStarryNight().model)
        return model
    } catch{
        fatalError("Failed to obtain VanCoghModel")
    }
}()

The first property returns an instance of VNCoreMLModel, wrapping our FastStyleTransferVanGoghStarryNight model. Wrapping our model is necessary to make it compatible with the Vision framework's requests classes.

Just underneath, add the following snippet, which will be responsible for returning the appropriate VNCoreMLModel, based on the selected style:

var model : VNCoreMLModel{
    get{
        if self.style == .VanCogh{
            return self.vanCoghModel
        }
        
        // default
        return self.vanCoghModel
    }
}

Finally, we create the method that will be responsible for returning an instance of VNCoreMLRequest, based on the currently selected model (determined by the current style):

func getRequest() -> VNCoreMLRequest{
    let request = VNCoreMLRequest(
        model: self.model,
        completionHandler: { [weak self] request, error in
            self?.processRequest(for: request, error: error)
        })
    request.imageCropAndScaleOption = .centerCrop
    return request
}

VNCoreMLRequest is responsible for performing the necessary preprocessing on the input image before passing it to the assigned Core ML model. We instantiate VNCoreMLRequest, passing in a completion handler that will simply pass its results to the processRequest method, of the ImageProcessor class, when called. We also set the imageCropAndScaleOption to .centerCrop so that our image is resized to 320 x 320 whilst maintaining its aspect ratio (cropping the centered image on its longest side, if necessary).

With our properties now defined, it's time to jump into the processImage method to initiate the actual work; add the following code (shown in bold, and replacing the // TODO comments):

public func processImage(ciImage:CIImage){        
    DispatchQueue.global(qos: .userInitiated).async {
        let handler = VNImageRequestHandler(ciImage: ciImage)
        do {
            try handler.perform([self.getRequest()])
        } catch {
            print("Failed to perform classification.
(error.localizedDescription)")
        }
    }
}

The preceding method is our entry point to stylizing an image; we start by instantiating an instance of VNImageRequestHandler, passing in the image, and initiating the process by calling the perform method. Once the analysis has finished, the request will call the delegate we assigned to it, processRequest, passing in a reference of the associated request and the results (or errors if any). Let's flesh out this method now:

func processRequest(for request:VNRequest, error: Error?){
    guard let results = request.results else {
        print("ImageProcess", #function, "ERROR:",
              String(describing: error?.localizedDescription))
        self.delegate?.onImageProcessorCompleted(
            status: -1,
            stylizedImage: nil)
        return
    }
    
    let stylizedPixelBufferObservations =
        results as! [VNPixelBufferObservation]
    
    guard stylizedPixelBufferObservations.count > 0 else {
        print("ImageProcess", #function,"ERROR:",
              "No Results")
        self.delegate?.onImageProcessorCompleted(
            status: -1,
            stylizedImage: nil)
        return
    }
    
    guard let cgImage = stylizedPixelBufferObservations[0]
        .pixelBuffer.toCGImage() else{
        print("ImageProcess", #function, "ERROR:",
              "Failed to convert CVPixelBuffer to CGImage")
        self.delegate?.onImageProcessorCompleted(
            status: -1,
            stylizedImage: nil)
        return
    }
    
    DispatchQueue.main.sync {
        self.delegate?.onImageProcessorCompleted(
            status: 1,
            stylizedImage:cgImage)
    }
}

While VNCoreMLRequest is responsible for the image analysis, VNImageRequestHandler is responsible for executing the request (or requests).

If no errors occurred during the analysis, we should be returned the instance of our request with its results property set. As we are only expecting one request and result type, we cast the results to an array of VNPixelBufferObservation, a type of observation suitable for image analysis with a Core ML model whose role is image-to-image processing, such as our style transfer model.

We can get a reference to our stylized image via the property pixelBuffer, from the observation obtained from the results. And then we can call the extension method toCGImage (found in CVPixelBuffer+Extension.swift) to conveniently obtain the output in a format we can easily use, in this case, updating the image view.

As previously discussed, let's see what happens when we try to run an image through our model without implementing the custom layers. Build and deploy to a device and proceed to take a photo, then select the Van Cogh style from the styles displayed. In doing so, you will observe the build failing and reporting the error: Error creating Core ML custom layer implementation from factory for layer "RescaleOutputLambda" (as we were expecting).

Let's address this now by implementing each of our custom layers, starting with the RescaleOutputLambda class. Create a new Swift file named RescaleOutputLamdba.class and replace the template code with the following:

import Foundation
import CoreML
import Accelerate

@objc(RescaleOutputLambda) class RescaleOutputLambda: NSObject, MLCustomLayer {    
    required init(parameters: [String : Any]) throws {
        super.init()
    }
    
    func setWeightData(_ weights: [Data]) throws {
        
    }
    
    func outputShapes(forInputShapes inputShapes: [[NSNumber]]) throws
        -> [[NSNumber]] {
            
    }
    
    func evaluate(inputs: [MLMultiArray], outputs: [MLMultiArray]) throws {
        
    }
}

Here, we have created a concrete class of the protocol MLCustomLayer, a protocol that defines the behavior of a custom layer in our neural network model. The protocol consists of four required methods and one optional method, which are as follows:

init(parameters): Initializes the custom layer implementation that is passed the dictionary parameters that includes any additional configuration options for the layer. As you may recall, we created an instance of NeuralNetwork_pb2.CustomLayerParams for each of our custom layers when converting our Keras model. Here we can add more entries, which will be passed into this dictionary. This provides some flexibility, such as allowing you to adjust your layer based on the set parameters.
setWeightData(): Assigns the weights for the connections within the layer (for layers with trainable weights).
outputShapes(forInputShapes): This determines how the layer modifies the size of the input data. Our RescaleOutputLambda layer doesn't change the size of the layer, so we simply need to return the input shape, but we will make use of this when implementing the next custom layer.
evaluate(inputs, outputs): This performs the actual computation; this method is required and gets called when the model is run on the CPU.
encode(commandBuffer, inputs, outputs): This method is optional and acts as an alternative to the method evaluate, which uses the GPU rather than the CPU.

Because we are not passing in any custom parameters or setting any trainable weights, we can skip the constructor and setWeightData methods; let's walk through the remaining methods, starting with outputShapes(forInputShapes).

As previously mentioned, this layer doesn't change the shape of the input, therefore, we can simply return the input shape, as shown in the following code:

func outputShapes(forInputShapes inputShapes: [[NSNumber]]) throws
    -> [[NSNumber]] {
        return inputShapes
}

With our outputShapes(forInputShapes) method now implemented, let's turn our attention to the workhorse of the layer responsible for performing the actual computation, the evaluate method. The evaluate method receives an array of MLMultiArray objects as inputs, along with another array of MLMultiArray objects, where it is expected to store the results. Having the evaluate method accept arrays for its input and outputs allows for greater flexibility in supporting different architectures, but, in this example, we are expecting only one input and one output.

As a reminder, this layer is for scaling each element from a range of -1.0 - 1.0 to a range of 0 - 255 (that's what a typical image would be expecting). The simplest approach is to iterate through each element and scale it using the equation we saw in Python: ((x+1)*127.5. This is exactly what we'll do; add the following code (in bold) to the body of your evaluate method:

func evaluate(inputs: [MLMultiArray],outputs: [MLMultiArray]) throws {    
    let rescaleAddition = 1.0
    let rescaleMulitplier = 127.5
    
    for (i, input) in inputs.enumerated(){
        // expecting [1, 1, Channels, Kernel Width, Kernel Height]
        let shape = input.shape 
        for c in 0..<shape[2].intValue{
            for w in 0..<shape[3].intValue{
                for h in 0..<shape[4].intValue{
                    let index = [
                        NSNumber(value: 0),
                        NSNumber(value: 0),
                        NSNumber(value: c),
                        NSNumber(value: w),
                        NSNumber(value: h)]
                    let outputValue = NSNumber(
                        value:(input[index].floatValue + rescaleAddition)
                            * rescaleMulitplier)
                    
                    outputs[i][index] = outputValue
                }
            }
        }
    }
}

The bulk of this method is made up of code used to create the index for obtaining the appropriate value from the input and pointing to its output counterpart. Once an index has been created, the Python formula is ported across to Swift: input[index].doubleValue + rescaleAddition) * rescaleMulitplier. This concludes our first custom layer; let's now implement our second customer layer, ResCropBlockLambda.

Create a new file called ResCropBlockLambda.swift and add the following code, overwriting any existing code:

import Foundation
import CoreML
import Accelerate

@objc(ResCropBlockLambda) class ResCropBlockLambda: NSObject, MLCustomLayer {
    
    required init(parameters: [String : Any]) throws {
        super.init()
    }
    
    func setWeightData(_ weights: [Data]) throws {
    }
    
    func outputShapes(forInputShapes inputShapes: [[NSNumber]]) throws
        -> [[NSNumber]] {
    }
    
    func evaluate(inputs: [MLMultiArray], outputs: [MLMultiArray]) throws {
    }
}

As we have done with the previous custom layer, we have stubbed out all the required methods as determined by the MLCustomLayer protocol. Once again, we can ignore the constructor and setWeightData method as neither are used in this layer.

If you recall, and as the name suggests, the function of this layer is to crop the width and height of one of the inputs of the residual block. We need to reflect this within the outputShapes(forInputShapes) method, so that the network knows the input dimensions for subsequent layers. Update the outputShapes(forInputShapes) method with the following code:

func outputShapes(forInputShapes inputShapes: [[NSNumber]]) throws
    -> [[NSNumber]] {        
        return [[NSNumber(value:inputShapes[0][0].intValue),
                 NSNumber(value:inputShapes[0][1].intValue),
                 NSNumber(value:inputShapes[0][2].intValue),
                 NSNumber(value:inputShapes[0][3].intValue - 4),
                 NSNumber(value:inputShapes[0][4].intValue - 4)]];
}

Here, we are removing a constant of 4 from the width and height, essentially padding 2 from the width and height. Next, we implement the evaluate method, which performs this cropping. Replace the evaluate method with the following code:

func evaluate(inputs: [MLMultiArray], outputs: [MLMultiArray]) throws {
    for (i, input) in inputs.enumerated(){
        
        // expecting [1, 1, Channels, Kernel Width, Kernel Height]
        let shape = input.shape
        for c in 0..<shape[2].intValue{
            for w in 2...(shape[3].intValue-4){
                for h in 2...(shape[4].intValue-4){
                    let inputIndex = [
                        NSNumber(value: 0),
                        NSNumber(value: 0),
                        NSNumber(value: c),
                        NSNumber(value: w),
                        NSNumber(value: h)]
                    
                    let outputIndex = [
                        NSNumber(value: 0),
                        NSNumber(value: 0),
                        NSNumber(value: c),
                        NSNumber(value: w-2),
                        NSNumber(value: h-2)]
                    
                    outputs[i][outputIndex] = input[inputIndex]
                }
            }
        }
    }
}

Similar to the evaluate method of our RescaleOutputLambda layer, the bulk of this method has to do with creating the indices for the input and output arrays. We simply pad it by restraining the ranges of our loops to the desired width and height.

Now, if you build and run the project, you will be able to run an image through the Van Gogh network getting a stylized version of it back, similar to what is shown in the following image:

When running on the simulator, the whole process took approximately 22.4 seconds. In the following two sections, we will spend some time looking at how we can reduce this.

Table of Contents for Building custom layers in Swift

Create new playlist

Sign In

Sign Up

Table of Contents for
Building custom layers in Swift