Input data and preprocessing 

In this section, we will implement the preprocessing functionality required to transform our raw user input into something the model is expecting. We will build up this functionality in a playground project before migrating it across to our project in the next section.

If you haven't done so, pull down the latest code from the accompanying repository https://github.com/PacktPublishing/Machine-Learning-with-Core-ML. Once downloaded, navigate to the directory Chapter8/Start/ and open the playground project ExploringQuickDrawData.playground. Once loaded, you will see the playground for this chapter, as shown:

The playground includes a few samples of the raw Quick, Draw! dataset, a single simplified extract, as well as the complied model and supporting classes we created in the previous chapter to represent a sketch (Stroke.swift, Sketch.swift) and render it (SketchView.swift). Our goal for this section will be to better understand the data and the preprocessing required before feeding our model; in doing so, we will be extending our existing classes to encapsulate this functionality.

Let's start reviewing what code exists before we move forward; if you scroll down the opened source file, you will see the methods createFromJSON and drawSketch. The former takes in a JSON object (the format our samples are saved in) and returns a strongly typed object: StrokeSketch. As a reminder, each sample is made up of:

key_id: Unique identifier
word: Category label
countrycode: Country code where the sample was drawn
timestamp: Timestamp when the sample was created
recognized: A flag indicating whether the sketch was currently recognized
drawing: A multi-dimensional array consisting of arrays of x, y coordinates along with the elapsed time since the point was created

The StrokeSketch maps the word to the label property and x, y coordinates to the stroke points. We discard everything else as it is not deemed useful in classification and not used by our model. The drawSketch method is a utility method that handles scaling and centering the sketch before creating an instance of a SketchView to render the scaled and centered sketch.

The last block of code preloads the JSON files and makes them available through the dictionary loadedJSON, where the key is the associated filename and value is the loaded JSON object.

Let's start by taking a peek at the data, comparing the raw samples to the simplified samples; add the following code to your playground:

if let rJson = loadedJSON["small_raw_airplane"],
    let sJson = loadedJSON["small_simplified_airplane"]{
    
    if let rSketch = StrokeSketch.createFromJSON(json: rJson[0] as?   [String:Any]),
        let sSketch = StrokeSketch.createFromJSON(json: sJson[0] as? [String:Any]){
        drawSketch(sketch: rSketch)
        drawSketch(sketch: sSketch)
    }
    
    if let rSketch = StrokeSketch.createFromJSON(json: rJson[1] as? [String:Any]),
        let sSketch = StrokeSketch.createFromJSON(json: sJson[1] as? [String:Any]){
        drawSketch(sketch: rSketch)
        drawSketch(sketch: sSketch)
    }
}

In the previous code snippet, we are simply getting a reference to our loaded JSON files and passing the samples at index 0 and 1 to our createFromJSON file, which will return their StrokeSketch representation. We then proceed to pass this into our drawSketch method to create the view to render them. After running, you can preview each of the sketches by clicking on the eye icon located to the right-hand panel on the same line as the call to the method drawSketch. The following image presents both outputs side by side for comparison:

The major differences between the samples from the raw dataset and simplified dataset can be seen in the preceding figure. The raw sample is much larger and smoother. What is not obvious from the previous image is that the simplified sample is positioned to the top left while the raw one consists of points in their original and absolute positions (recalling that our drawSketch method rescales, if required, and centers the sketch).

As a reminder, the raw samples resemble the input we are expecting to receive from the user, while on the other hand our model was trained on the samples from the simplified dataset. Therefore, we need to perform the same preprocessing steps that have been used to transform the raw data into its simplified counterparts before feeding our model. These steps, described in the repository for the data at https://github.com/googlecreativelab/quickdraw-dataset, are listed as follows, and this is what we will now implement in our playground:

Align the drawing to the top-left corner to have minimum values of zero
Uniformly scale the drawing to have a maximum value of 255
Resample all strokes with a one pixel spacing
Simplify all strokes using the Ramer-Douglas-Peucker algorithm with an epsilon value of 2.0

The Ramer–Douglas–Peucker algorithm takes a curve composed of line segments (strokes) and finds a simpler curve with fewer points. You can learn more about the algorithm here: https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm.

The rationale behind these steps should be fairly self-explanatory and is highlighted from the figure showing the two sketches of an airplane. That is, the airplane should be invariant to its actual position on the screen and invariant to the scale. And simplifying the stroke makes it easier for our model to learn as it helps ensure that we only capture salient features.

Start off by creating an extension of your StrokeSketch class and stubbing out the method simplify, as shown:

public func simplify() -> StrokeSketch{
    let copy = self.copy() as! StrokeSketch     
}

We will be mutating a clone of the instance of itself, which is why we first create a copy. We next want to calculate the scale factor required to scale the sketch to have a maximum height and/or width of 255 while respecting its aspect ratio; add the following code to your simplify method, which does just this:

let minPoint = copy.minPoint
let maxPoint = copy.maxPoint
let scale = CGPoint(x: maxPoint.x-minPoint.x, y:maxPoint.y-minPoint.y)

var width : CGFloat = 255.0
var height : CGFloat = 255.0

// adjust aspect ratio
if scale.x > scale.y{
    height *= scale.y/scale.x
} else{
    width *= scale.y/scale.x
}

For each dimension (width and height), we have calculated the scale required to ensure that our sketch is either scaled up or down to a dimension of 255. We now need to apply this to each of the points associated with each of the strokes held by the StrokeSketch class; as we're iterating through each point, it also makes sense to align our sketch to the top-left corner (x= 0, y = 0) as a required preprocessing step. We can do this simply by subtracting the minimum value of each of the dimensions. Append the following code to your simplify method to do this:

for i in 0..<copy.strokes.count{
    copy.strokes[i].points = copy.strokes[i].points.map({ (pt) -> CGPoint in
        let x : CGFloat = CGFloat(Int(((pt.x - minPoint.x)/scale.x) * width))
        let y : CGFloat = CGFloat(Int(((pt.y - minPoint.y)/scale.y) * height))        
        return CGPoint(x:x, y:y)
    })
}

Our final step is to simplify the curve using the Ramer-Douglas-Peucker algorithm; to do this, we will make the Stroke responsible for implementing the details and just delegate the task there. Add the final piece of code to your simplify method within your StrokeSketch extension:

copy.strokes = copy.strokes.map({ (stroke) -> Stroke in
    return stroke.simplify()
})

return copy

The Ramer-Douglas-Peucker algorithm recursively traverses the curve, initially starting with the first and last point and finding the point that is furthest from this line segment. If the point is closer than a given threshold, then any points currently marked to be kept can be discarded, but if the point is greater than our threshold then that point must be kept. The algorithm then recursively calls itself with the first point and furthest point as well as furthest point and last point. After traversing the whole curve, the result is a simplified curve that only consists of the points marked as being kept, as described previously. The process is summarized in the following figure:

Let's start by extending the CGPoint structure to include a method for calculating the distance of a point given a line; add this code to your playground:

public extension CGPoint{
    
    public static func getSquareSegmentDistance(p0:CGPoint,
                                                p1:CGPoint,
                                                p2:CGPoint) -> CGFloat{
        let x0 = p0.x, y0 = p0.y
        var x1 = p1.x, y1 = p1.y
        let x2 = p2.x, y2 = p2.y
        var dx = x2 - x1
        var dy = y2 - y1
        
        if dx != 0.0 && dy != 0.0{
            let numerator = (x0 - x1)
                * dx + (y0 - y1)
                * dy
            let denom = dx * dx + dy * dy
            let t = numerator / denom
            
            if t > 1.0{
                x1 = x2
                y1 = y2
            } else{
                x1 += dx * t
                y1 += dy * t
            }
        }
        
        dx = x0 - x1
        dy = y0 - y1
        
        return dx * dx + dy * dy
    }
}

Here, we have added a static method to the CGPoint structure; it calculates the perpendicular distance of a point given a line (which is the value we compare with our threshold to simplify our line, as previously described). Next, we will implement the recursive method as described, which will be used to build up the curve by testing and discarding any points under our threshold. As mentioned, we will encapsulate this functionality within the Stroke class itself, so we start off by stubbing out the extension:

public extension Stroke{
}

Now, within the extension, add the recursive method:

func simplifyDPStep(points:[CGPoint], first:Int, last:Int,
                    tolerance:CGFloat, simplified: inout [CGPoint]){
    
    var maxSqDistance = tolerance
    var index = 0
    
    for i in first + 1..<last{
        let sqDist = CGPoint.getSquareSegmentDistance(
            p0: points[i],
            p1: points[first],
            p2: points[last])
        
        if sqDist > maxSqDistance {
            maxSqDistance = sqDist
            index = i
        }
    }
    
    if maxSqDistance > tolerance{
        if index - first > 1 {
            simplifyDPStep(points: points,
                           first: first,
                           last: index,
                           tolerance: tolerance,
                           simplified: &simplified)
        }
        
        simplified.append(points[index])
        
        if last - index > 1{
            simplifyDPStep(points: points,
                           first: index,
                           last: last,
                           tolerance: tolerance,
                           simplified: &simplified)
        }
    }
}

Most of this should make sense as it's a direct implementation of the algorithm described. We start off by finding the furthest distance, which must be greater than our threshold; otherwise, the point is ignored. We add the point to the array of points to keep and then pass each end of the segment to our recursive method until we have traversed the whole curve.

The last method we need to implement is the method responsible for initiating this process, which we will also encapsulate within our Stroke extension; so go ahead and add the following method to your extension:

public func simplify(epsilon:CGFloat=3.0) -> Stroke{
    
    var simplified: [CGPoint] = [self.points.first!]
    
    self.simplifyDPStep(points: self.points,
                        first: 0, last: self.points.count-1,
                        tolerance: epsilon * epsilon,
                        simplified: &simplified)
    
    simplified.append(self.points.last!)
    
    let copy = self.copy() as! Stroke
    copy.points = simplified
    
    return copy
}

The simplify method simply (excuse the pun) creates an array of points of our simplified curve, adding the first point, before kicking off the recursive method we had just implemented. Then, when the curve has been traversed, it finally adds the last point before returning the Stroke with the simplified points.

At this point, we have implemented the functionality required to transform raw input into its simplified form, as specified in the Quick, Draw! repository. Let's verify our work by comparing our simplified version of a raw sketch with an existing simplified version of the same sketch. Add the following code to your playground:

  if let rJson = loadedJSON["small_raw_airplane"],
    let sJson = loadedJSON["small_simplified_airplane"]{
    
    if let rSketch = StrokeSketch.createFromJSON(json: rJson[2] as? [String:Any]),
        let sSketch = StrokeSketch.createFromJSON(json: sJson[2] as? [String:Any]){
        drawSketch(sketch: rSketch)
        drawSketch(sketch: sSketch)
        drawSketch(sketch: rSketch.simplify())
    }
}

As we did before, you can click on the eye icon within the right-hand-side panel for each of the drawSketch calls to preview each of the sketches. The first is the sketch from the raw dataset, the second is from the simplified dataset, and third is by using our simplified implementation, using the sample from the raw dataset. If everything goes as per the plan, then you should see something that resembles the following:

At close inspection, our simplified version looks as though it is more aggressive than the sample from the simplified dataset, but we can easily tweak this by adjusting our threshold. However, for all intents and purposes, this will suffice for now. At this point, we have the required functionality to simplify our dataset, transforming it to something that resembles the training dataset. But before feeding our data into the model, we have more preprocessing to do; let's do that now, starting with a quick discussion of what our model is expecting.

Our model is expecting each sample to have three dimensions; point position (x, y) and a flag indicating whether the point is the last point for its associated stroke. The reason for having this flag is that we are passing in a fixed-length sequence of size 75. That is, each sketch will be either truncated to squeeze into this sequence or padded out with leading zeros to fill it. And using a flag is a way to add context indicating whether it is the end of the stroke or not (keeping in mind that our sequence represents our sketch and our sketch is made up of many strokes).

Next, as usual, we normalize the inputs to a range of 0.0 - 1.0 to avoid having our model fluctuate while training due to large weights. The last adjustment is converting our absolute values into deltas, which makes a lot of sense when you think about it. The first reason is that we want our model to be invariant to the actual position of each point; that is, we could draw the same sketch side by side, and ideally we want these to be classified as the same class. In the previous chapter, we achieved this by using a CNN operating on pixel data range and positions as we are doing here. The second reason for using deltas rather than absolute values is that the delta carries more useful information than the absolute position, that is, direction. After implementing this, we will be ready to test out our model, so let's get going; start by adding the following extension and method that will be responsible for this preprocessing step:

extension StrokeSketch{
    
    public static func preprocess(_ sketch:StrokeSketch)
        -> MLMultiArray?{
        let arrayLen = NSNumber(value:75 * 3) 
                
        guard let array = try? MLMultiArray(shape: [arrayLen],
                                            dataType: .double)
            else{ return nil }
        
        let simplifiedSketch = sketch.simplify()
        
    }
}

Here we have added the static method preprocess to our StrokeSketch class via an extension; within this method, we begin by setting up the buffer that will be passed to our model. The size of this buffer needs to fit a full sequence, which is calculated simply by multiplying the sequence length (75) with the number of dimensions (3). We then call simplify on the StrokeSketch instance to obtain the simplified sketch, ensuring that it closely resembles the data we had trained our model on.

Next, we will iterate through each point for every stroke, normalizing the point and determining the value of the flag (one indicating the end of the stroke; otherwise it's zero). Append the following code to your preprocess method:

let minPoint = simplifiedSketch.minPoint
let maxPoint = simplifiedSketch.maxPoint
let scale = CGPoint(x: maxPoint.x-minPoint.x,
                    y:maxPoint.y-minPoint.y)

var data = Array<Double>()
    
for i in 0..<simplifiedSketch.strokes.count{
    for j in 0..<simplifiedSketch.strokes[i].points.count{
        let point = simplifiedSketch.strokes[i].points[j]
        let x = (point.x-minPoint.x)/scale.x
        let y = (point.y-minPoint.y)/scale.y
        let z = j == simplifiedSketch.strokes[i].points.count-1
            ? 1 : 0
        
        data.append(Double(x))
        data.append(Double(y))
        data.append(Double(z))
    }

We start by obtaining the minimum and maximum values, which we will use when normalizing each point (using the equation xⁱ−min(x)/max(x)−min(x), where x_i is a single point and x represents all points within that stroke). Then we create a temporary place to store the data before iterating through all our points, normalizing each one, and determining the value of the flag as described previously.

We now want to calculate the deltas of each point and finally remove the last point as we are unable to calculate its delta; append the following to your preprocess method:

let dataStride : Int = 3
for i in stride(from: dataStride, to:data.count, by: dataStride){
    data[i - dataStride] = data[i] - data[i - dataStride] 
    data[i - (dataStride-1)] = data[i+1] - data[i - (dataStride-1)] 
    data[i - (dataStride-2)] = data[i+2] 
}

data.removeLast(3)

The previous code should be self-explanatory; the only notable point worth highlighting is that we are now dealing with a flattened array, and therefore we need to use a stride of 3 when traversing the data.

One last chunk of code to add! We need to ensure that our array is equal to 75 samples (our sequence length, that is, an array of length 225). We do this by either truncating the array if too large or padding it out if too small. We can easily do this while copying the data from our temporary array, data, across to the buffer that we will be passing to our model, array. Here we first calculate the starting index and then proceed to iterate through the whole sequence, copying the data across if the current index has passed our starting index, or else padding it with zeros. Add the following snippet to finish off your preprocess method:

var dataIdx : Int = 0
let startAddingIdx = max(array.count-data.count, 0)

for i in 0..<array.count{
    if i >= startAddingIdx{
        array[i] = NSNumber(value:data[dataIdx])
        dataIdx = dataIdx + 1
    } else{
        array[i] = NSNumber(value:0)
    }
}

return array

With our preprocess method now complete, we are ready to test out our model. We will start by instantiating our model (contained within the playground) and then feeding in a airplane sample we have used previously, before testing with the other categories. Append the following code to your playground:

let model = quickdraw()

if let json = loadedJSON["small_raw_airplane"]{
    if let sketch = StrokeSketch.createFromJSON(json: json[0] as? [String:Any]){
        if let x = StrokeSketch.preprocess(sketch){
            if let predictions = try? model.prediction(input:quickdrawInput(strokeSeq:x)){
                print("Class label (predictions.classLabel)")
                print("Class label probability/confidence (predictions.classLabelProbs["airplane"] ?? 0)")
            }
        }
    }
}

If all goes well, your playground will output the following to the console:

It has predicted the category of airplane and done so fairly confidently (with a probability of approximately 77%). Before we migrate our code into our application, let's test with some other categories; we will start by implementing a method to handle all the leg work and then proceed to pass some samples to perform inference. Add the following method to your playground, which will be responsible for obtaining and preprocessing the sample before passing it to your model for prediction and then returning the results as a formatted string containing the most likely category and probability:

func makePrediction(key:String, index:Int) -> String{
    if let json = loadedJSON[key]{
        if let sketch = StrokeSketch.createFromJSON(
            json: json[index] as? [String:Any]){
            if let x = StrokeSketch.preprocess(sketch){
                if let predictions = try? model.prediction(input:quickdrawInput(strokeSeq:x)){
                    return "(predictions.classLabel) (predictions.classLabelProbs[predictions.classLabel] ?? 0)"
                }
            }
        }
    }
    
    return "None"
}

With most of the work now done, we are just left with the nail-biting task of testing that our preprocessing implementation and model are sufficiently able to predict the samples we pass. Let's test with each category; add the following code to your playground:

print(makePrediction(key: "small_raw_airplane", index: 0))
print(makePrediction(key: "small_raw_alarm_clock", index: 1))
print(makePrediction(key: "small_raw_bee", index: 2))
print(makePrediction(key: "small_raw_sailboat", index: 3))
print(makePrediction(key: "small_raw_train", index: 4))
print(makePrediction(key: "small_raw_truck", index: 5))
print(makePrediction(key: "small_simplified_airplane", index: 0))

The output for each of these can be seen in this screenshot:

Not bad! We managed to predict all the categories correctly, albeit the truck was only given the probability of 41%. And interestingly, our simplified airplane sample was given a higher probability (84%) than its counterpart from the raw dataset (77%).

Out of curiosity, let's peek at the truck sample we asked our model to predict:

All due respect to the artist, but I would be pushed to predict a truck from this sketch, so full credit to our model.

We have now exposed our model to a variety of categories, and each one we are able to predict correctly, which implies that our preprocessing code has been satisfactorily implemented. We are now ready to migrate our code across to our application, but before doing so, one very last experiment. Let's think about how our model has been trained and how it will be used in the context of the application. The model was trained on sequences that are essentially strokes, the user made while drawing their sketch. This is precisely how users will be interacting with our application; they will be sketching something with a series (or sequence) of strokes; each time they finish a stroke, we want to try and predict what it is they are trying to draw. Let's mimic that behavior by building up a sample stroke by stroke, predicting after each subsequent stroke is added to evaluate how well the model performs in a more realistic setting. Add the following code to your playground:

if let json = loadedJSON["small_raw_bee"]{
    if let sketch = StrokeSketch.createFromJSON(json: json[2] as? [String:Any]){
        let strokeCount = sketch.strokes.count
        print("(sketch.label ?? "" ) sketch has (strokeCount) strokes")
        
        for i in (0..<strokeCount-1).reversed(){
            let copyOfSketch = sketch.copy() as! StrokeSketch
            copyOfSketch.strokes.removeLast(i)
            if let x = StrokeSketch.preprocess(copyOfSketch){
                if let predictions = try? model.prediction(input:quickdrawInput(strokeSeq:x)){
                    let label = predictions.classLabel
                    let probability = String(format: "%.2f", predictions.classLabelProbs[predictions.classLabel] ?? 0)
                    
                    print("Guessing (label) with probability of (probability) using (copyOfSketch.strokes.count) strokes")
                }
            }
        }
    }
}

Nothing new has being introduced here; we are just loading in a sketch, slowly building it up stroke by stroke as discussed, and passing up the partial sketch to our model to perform inference. Here are the results, with their corresponding sketches to give the results more context:

All reasonable predictions, possibly uncovering how a lot of people draw a hockey puck, mouth, and bee. Now, satisfied with our implementation, let's move on to the next section, where we will migrate this code and look at how we can obtain and compile a model at runtime.

Table of Contents for Input data and preprocessing&#xA0;

Create new playlist

Sign In

Sign Up

Table of Contents for
Input data and preprocessing