Accelerating our layers 

Let's return to the layer RescaleOutputLambda and see where we might be able to shed a second or two off the processing time. As a reminder, the function of this layer is to rescale each element in the output, where our output can be thought of as a large vector. Luckily for us, Apple provides an efficient framework and API for just this. Instead of operating on each element within a loop, we will take advantage of the Accelerate framework and its vDSPAPI to perform this operation in a single step. This process is called vectorization and is made possible by exploiting the CPU's Single Instruction, Multiple Data (SIMD) instruction set. Return to the RescaleOutputLambda class and update the evaluate method with the following code:

func evaluate(inputs: [MLMultiArray], outputs: [MLMultiArray]) throws {
    var rescaleAddition : Float = 1.0
    var rescaleMulitplier : Float = 127.5
    
    for (i, _) in inputs.enumerated(){
        
        let input = inputs[i]
        let output = outputs[i]
        
        let count = input.count
        let inputPointer = UnsafeMutablePointer<Float>(
            OpaquePointer(input.dataPointer)
        )
        let outputPointer = UnsafeMutablePointer<Float>(
            OpaquePointer(output.dataPointer)
        )
        
        vDSP_vsadd(inputPointer, 1,
                   &rescaleAddition,
                   outputPointer, 1,
                   vDSP_Length(count))
        
        vDSP_vsmul(outputPointer, 1,
                   &rescaleMulitplier,
                   outputPointer, 1,
                   vDSP_Length(count))
    }
}

In the preceding code, we first get a reference to the pointers to each of the input and output buffers, wrapping them in UnsafeMutablePointer, as required by the vDSP functions. Then, it's simply a matter of applying each of our scaling operations using the equivalent vDSP functions, which we will walk through.

First, we add our constant of 1 to the input and save the results in the output buffer, as shown in the following snippet:

vDSP_vsadd(inputPointer, 1,
           &rescaleAddition,
           outputPointer, 1,
           vDSP_Length(count))

Where the function vDSP_vsadd takes in a pointer to our vector (inputPointer) and adds rescaleAddition to each of its elements before storing it into the output.

Next, we apply our multiplier to each of the elements of the output (which currently has each of its values set to the input with 1 added to it); the code for this is shown in the following snippet:

vDSP_vsmul(outputPointer, 1,
           &rescaleMulitplier,
           outputPointer, 1,
           vDSP_Length(count))

Similar to vDSP_vsadd, vDSP_vsmul takes in the input (in this case, our output); the scalar we want to multiply each element by; the output; the stride for persisting the result; and finally, the number of elements we want to operate on.

If you rerun the application, you will see that we have managed to shed a few seconds off the total execution time—not bad considering this layer is run only once at the end of our network. Can we do better?

Table of Contents for Accelerating our layers&#xA0;

Create new playlist

Sign In

Sign Up

Table of Contents for
Accelerating our layers