Capturing data 

If you haven't done it already, download the latest code from the accompanying repository: https://github.com/packtpublishing/machine-learning-with-core-ml. Once downloaded, navigate to the directory Chapter3/Start/ObjectRecognition/ and open the project ObjectRecognition.xcodeproj. Once loaded, you will see the skeleton project for this chapter, as shown in the following screenshot:

To help you navigate around the project, here is a list of core files/classes and their main functions:

VideoCapture will be responsible for the management and handling of the camera, including capturing video frames
CaptureVideoPreviewView.swift contains the class CapturePreviewView, which will be used to present the captured frames
CIImage provides convenient extensions to the class CIImage, used for preparing the frame for the Core ML model
VideoController, as you would expect, is the controller for the application and is responsible for interfacing with the imported Core ML model

We will be making changes to each of these in the following sections in order to realize the desired functionality. Our first task will be to get access to the camera and start capturing frames; to do this, we will be making use of Apple's iOS frameworks AVFoundation and CoreVideo.

The AVFoundation framework encompasses classes for handing capturing, processing, synthesizing, controlling, importing, and exporting of audiovisual media on iOS and other platforms. In this chapter, we are most interested in a subset of this framework for dealing with cameras and media capture, but you can learn more about the AVFoundation framework on Apple's official documentation site at https://developer.apple.com/documentation/avfoundation.

CoreVideo provides a pipeline-based API for manipulating digital videos, capable of accelerating the process using support from both Metal and OpenGL.

We will designate the responsibility of setting up and capturing frames from the camera to the class VideoCapture; let's jump into the code now. Select VideoCapture.swift from the left-hand side panel to open in the editing window. Before making amendments, let's inspect what is already there and what's left to do.

At the top of the class, we have the protocol VideoCaptureDelegate defined:

public protocol VideoCaptureDelegate: class {
    func onFrameCaptured(
      videoCapture: VideoCapture, 
      pixelBuffer:CVPixelBuffer?, 
      timestamp:CMTime)
}

VideoCapture will pass through the captured frames to a registered delegate, thus allowing the VideoCapture class to focus solely on the task of capturing the frames. What we pass to the delegate is a reference to itself, the image data (captured frame) of type CVPixelBuffer and the timestamp as type CMTime. CVPixelBuffer is a CoreVideo data structure specifically for holding pixel data, and the data structure our Core ML model is expecting (which we'll see in a short while). CMTTime is just a struct for encapsulating a timestamp, which we'll obtain directly from the video frame.

Under the protocol, we have the skeleton of our VideoCapture class. We will be walking through it in this section, along with an extension to implement the AVCaptureVideoDataOutputSampleBufferDelegate protocol, which we will use to capture frames:

public class VideoCapture : NSObject{
    public weak var delegate: VideoCaptureDelegate?
    public var fps = 15
    var lastTimestamp = CMTime()
    override init() {
        super.init()
    }
    private func initCamera() -> Bool
    {
        return true
    }
    public func asyncStartCapturing(
        completion: (() -> Void)? = nil)
        {
         }
    public func asyncStopCapturing(
        completion: (() -> Void)? = nil)
        {
        }
}

Most of this should be self-explanatory, so I will only highlight the not-so-obvious parts, starting with the variables fps and lastTimestamp. We use these together to throttle how quickly we pass frames back to the delegate; we do this as it's our assumption that we capture frames far quicker than we can process them. And to avoid having our camera lag or jump, we explicitly limit how quickly we pass frames to the delegate. Frames per second (fps) sets this frequency while lastTimestamp is used in conjunction to calculate the elapsed time since the last processing of a frame.

The only other part of the code I will highlight here is the asyncStartCapturing and asyncStopCapturing methods; these methods, as the names imply, are responsible for starting and stopping the capture session respectively. Because they both will be using blocking methods, which can take some time, we will dispatch the task off the main thread to avoid blocking it and affecting the user's experience.

Finally, we have the extension; it implements the AVCaptureVideoDataOutputSampleBufferDelegate protocol:

extension VideoCapture : AVCaptureVideoDataOutputSampleBufferDelegate{
    public func captureOutput(_ output: AVCaptureOutput,
                              didOutput sampleBuffer: CMSampleBuffer,
                              from connection: AVCaptureConnection)
    {
    }
}

We will discuss the details shortly, but essentially it is the delegate that we assign to the camera for handling incoming frames of the camera. We will then proxy it through to the VideoCaptureDelegate delegate assigned to this class.

Let's now walk through implementing the methods of this class, starting with initCamera. In this method, we want to set up the pipeline that will grab the frames from the physical camera of the device and pass them onto our delegate method. We do this by first getting a reference to the physical camera and then wrapping it in an instance of the AVCaptureDeviceInput class, which takes care of managing the connection and communication with the physical camera. Finally, we add a destination for the frames, which is where we use an instance of AVCaptureVideoDataOutput, assigning ourselves as the delegate for receiving these frames. This pipeline is wrapped in something called AVCaptureSession, which is responsible for coordinating and managing this pipeline.

Let's now define some instance variables we'll need; inside the class VideoCapture, add the following variables:

let captureSession = AVCaptureSession()
let sessionQueue = DispatchQueue(label: "session queue")

We mentioned the purpose of captureSession previously, but also introduced a DispatchQueue. When adding a delegate to AVCaptureVideoDataOutput (for handling the arrival of new frames), you also pass in a DispatchQueue; this allows you to control which queue the frames are managed on. For our example, we will be handling the processing of the images off the main thread so as to avoid impacting the performance of the user interface.

With our instance variables now declared, we will turn our attention to the initCamera method, breaking it down into small snippets of code. Add the following within the body of the method:

captureSession.beginConfiguration()       
captureSession.sessionPreset = AVCaptureSession.Preset.medium

We signal to the captureSession that we want to batch multiple configurations by calling the method beginConfiguration; these changes won't be made until we commit them by calling the session's commitConfiguration method. Then, in the next line of code, we set the desired quality level:

guard let captureDevice = AVCaptureDevice.default(for: AVMediaType.video) else {
    print("ERROR: no video devices available")
    return false
}
 
guard let videoInput = try? AVCaptureDeviceInput(device: captureDevice) else {
    print("ERROR: could not create AVCaptureDeviceInput")
    return false
}
 
if captureSession.canAddInput(videoInput) {
    captureSession.addInput(videoInput)
}

In the next snippet, we obtain the physical device; here, we are obtaining the default device capable of recording video, but you can just as easily search for one with specific capabilities, such as the front camera. After successfully obtaining the device, we wrap it in an instance of AVCaptureDeviceInput that will be responsible for capturing data from the physical camera and finally adding it to the session.

We now have to add the destination for these frames; again, add the following snippet to the initCamera method where you left off:

let videoOutput = AVCaptureVideoDataOutput()
 
let settings: [String : Any] = [
    kCVPixelBufferPixelFormatTypeKey as String: NSNumber(value: kCVPixelFormatType_32BGRA)
]
videoOutput.videoSettings = settings
videoOutput.alwaysDiscardsLateVideoFrames = true
videoOutput.setSampleBufferDelegate(self, queue: sessionQueue)
 
if captureSession.canAddOutput(videoOutput) {
    captureSession.addOutput(videoOutput)
}
 
videoOutput.connection(with: AVMediaType.video)?.videoOrientation = .portrait

In the previous code snippet, we create, set up, and added our output. We start by instantiating an instance of AVCaptureVideoDataOutput, before defining what data we want. Here, we are requesting full color (kCVPixelFormatType_32BGRA), but depending on your model, it may be more efficient to request images in grayscale (kCVPixelFormatType_8IndexedGray_WhiteIsZero).

Setting alwaysDiscardsLateVideoFrames to true means any frames that arrive while the dispatch queue is busy will be discarded—a desirable feature for our example. We then assign ourselves along with our dedicated dispatch queue as the delegate for handing incoming frames using the method videoOutput.setSampleBufferDelegate(self, queue: sessionQueue). Once we have configured our output, we are ready to add it to our session as part of our configuration request. To prevent our images from being rotated by 90 degrees, we then request that our images are in portrait orientation.

Add the final statement to commit these configurations; it's only after we do this that these changes will take effect:

captureSession.commitConfiguration()

This now completes our initCamera method; let's swiftly (excuse the pun) move onto the methods responsible for starting and stopping this session. Add the following code to the body of the asyncStartCapturing method:

sessionQueue.async {
    if !self.captureSession.isRunning{
       self.captureSession.startRunning()
    }
 
    if let completion = completion{
        DispatchQueue.main.async {
            completion()
        }
    }
 }

As mentioned previously, the startRunning and stopRunning methods both block the main thread and can take some time to complete; for this reason, we execute them off the main thread, again to avoid affecting the responsiveness of the user interface. Invoking startRunning will start the flow of data from the subscribed inputs (camera) to the subscribed outputs (delegate).

Errors, if any, are reported through the notification AVCaptureSessionRuntimeError. You can subscribe to listen to it using the default NotificationCenter. Similarly, you can subscribe to listen when the session starts and stops with the notifications AVCaptureSessionDidStartRunning and AVCaptureSessionDidStopRunning, respectively.

Similarly, add the following code to the method asyncStopCapturing, which will be responsible for stopping the current session:

sessionQueue.async {
    if self.captureSession.isRunning{
        self.captureSession.stopRunning()
    }
 
    if let completion = completion{
        DispatchQueue.main.async {
            completion()
         }
     }
 }

Within the initCamera method, we subscribed ourselves as the delegate to handle arriving frames using the statement videoOutput.setSampleBufferDelegate(self, queue: sessionQueue); let's now turn our attention to handling this. As you may recall, we included an extension of the VideoCapture class to implement the AVCaptureVideoDataOutputSampleBufferDelegate protocol within the captureOutput method. Add the following code:

guard let delegate = self.delegate else{ return }
 
 let timestamp = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
 
 let elapsedTime = timestamp - lastTimestamp
 if elapsedTime >= CMTimeMake(1, Int32(fps)) {
 
 lastTimestamp = timestamp
 
 let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
 
 delegate.onFrameCaptured(videoCapture: self,
 pixelBuffer:imageBuffer,
 timestamp: timestamp)
 }

Before walking through this code snippet, it's worth mentioning what parameters this method is passed and how we use them. The first parameter, output, is of the type AVCaptureVideoDataOutput and references the associated output that this frame originated from. The next parameter, sampleBuffer, is of the type CMSampleBuffer and this is what we will use to access data of the current frame. Along with the frames, the duration, format, and timestamp associated with each frame can also be obtained. The final parameter, connection, is of the type AVCaptureConnection and provides a reference to the connection associated with the received frame.

Now, walking through the code, we start by guarding against any occurrences where no delegate is assigned, and returning early if so. Then we determine whether enough time has elapsed since the last time we processed a frame, remembering that we are throttling how frequently we process a frame to ensure a seamless experience. Here, instead of using the systems clock, we obtain the time associated with the latest frame via the statement let timestamp = CMSampleBufferGetPresentationTimeStamp(sampleBuffer); this ensures that we are measuring against the relative time with respect to the frame rather than absolute time of the system. Given that enough time has passed, we proceed to get a reference to the sample's image buffer via the statement CMSampleBufferGetImageBuffer(sampleBuffer), finally passing it over to the assigned delegate.

This now completes our VideoCapture class; let's move on to hooking it up to our view using the ViewController. But before jumping into the code, let's inspect the interface via the storyboard to better understand where we'll be presenting the video stream. Within Xcode, select Main.storyboard from the Project Navigator panel on the left to open up interface builder; when opened, you will be presented with a layout similar to the following screenshot:

Nothing complicated; we have a label to present our results and a view to render our video frames onto. If you select the VideoPreview view and inspect the class assigned to it, you will see we have a custom class to handle the rendering called, appropriately, CapturePreviewView. Let's jump into the code for this class and make the necessary changes:

import AVFoundation
 import UIKit
 
 class CapturePreviewView: UIView {
 
 }

Fortunately, AVFoundation makes available a subclass of CALayer specifically for rendering frames from the camera; all that remains for us to do is to override the view's layerClass property and return the appropriate class. Add the following code to the CapturePreviewView class:

override class var layerClass: AnyClass {
    return AVCaptureVideoPreviewLayer.self
}

This method is called early during the creation of the view and is used to determine what CALayer to instantiate and associate with this view. As previously mentioned, the AVCaptureVideoPreviewLayer is—as the name suggests—specifically for handling video frames. In order to get the frames rendered, we simply assign AVCaptureSession with the AVCaptureVideoPreviewLayer.session property. Let's do that now; first open up the ViewController class in Xcode and add the following variable (in bold):

@IBOutlet var previewView:CapturePreviewView!
@IBOutlet var classifiedLabel:UILabel!
 
 let videoCapture : VideoCapture = VideoCapture()

The previewView and classifiedLabel are existing variables associated with the interface via the Interface Builder. Here, we are creating an instance of VideoCapture, which we had implemented earlier. Next, we will set up and start the camera using the VideoCapture instance, before assigning the session to our previewView layer. Add the following code within the ViewDidLoad method under the statement super.viewDidLoad():

if self.videoCapture.initCamera(){
 (self.previewView.layer as! AVCaptureVideoPreviewLayer).session = self.videoCapture.captureSession
 
 (self.previewView.layer as! AVCaptureVideoPreviewLayer).videoGravity = AVLayerVideoGravity.resizeAspectFill
 
 self.videoCapture.asyncStartCapturing()
 } else{
 fatalError("Failed to init VideoCapture")
 }

Most of the code should look familiar to you as a lot of it is using the methods we have just implemented. First we initialize the camera, calling the initCamera method of the VideoCamera class. Then, if successful, we assign the created AVCaptureSession to the layer's session. We also hint to the layer how we want it to handle the content, in this case filling the screen whilst respecting its aspect ratio. Finally, we start the camera by calling videoCapture.asyncStartCapturing().

With that now completed, it's a good time to test that everything is working correctly. If you build and deploy on an iOS 11+ device, you should see the video frames being rendered on your phone's screen.

In the next section, we will walk through how to capture and process them for our model before performing inference (recognition).

Table of Contents for Capturing data&#xA0;

Create new playlist

Sign In

Sign Up

Table of Contents for
Capturing data