Now that we have completed our deep dive into using TensorFlow with a Windows Presentation Foundation (WPF) application and ML.NET, it is now time to dive into using Open Neural Network eXchange (ONNX) with ML.NET. Specifically, in this final chapter, we will review what ONNX is, in addition to creating a new example application with a pre-trained ONNX model called YOLO. This application will build on the previous chapter and show the bounding boxes of the objects that the model detects. In addition, we will close out the chapter with suggestions on improving the example, for it to either become a production-grade application or be integrated into a production application.

In this chapter, we will cover the following topics:

Breaking down ONNX and YOLO
Creating the ONNX object detection application
Exploring additional production application enhancements

Breaking down ONNX and YOLO

As mentioned in Chapter 1, Getting Started with Machine Learning and ML.NET, the ONNX standard is widely regarded within the industry as a truly universal format across machine learning frameworks. In the next two sections, we will review what ONNX provides, in addition to the YOLO model that will drive our example in this chapter.

Introducing ONNX

ONNX was created as a way for a less locked-down and free-flowing process when working with either pre-trained models or training models across frameworks. By providing an open format for frameworks to export to, ONNX allows interoperability, and thereby promotes experimentation that would have otherwise been prohibitive due to the nature of proprietary formats being used in almost every framework.

Currently, supported frameworks include TensorFlow, XGBoost, and PyTorch—in addition to ML.NET, of course.

If you want to deep dive into ONNX further, please check out their website: https://onnx.ai/index.h tml.

The YOLO ONNX model

Building on the work that was performed in Chapter 12, TensorFlow with ML.NET, in which we used the pre-trained Inception model, in this chapter, we are going to use the pre-trained YOLO model. This model provides very fast and accurate object detection, meaning it can find multiple objects within an image with a certain level of confidence. This differs from the last chapter's model that provided a pure image classification, such as water or food.

To help visualize the difference between the two models, take the previous chapter's TensorFlow model that classified water, and compare that to this chapter's object detection of a car, as illustrated in the following screenshot:

Object detection within images (and video) has been increasing in demand due to the significantly increased amount of images on the internet and the need for security. Imagine a crowded environment such as a football stadium, in particular by the front gates. Security guards patrol and monitor this area; however, like you, they are only human and can only glance at so many people with a certain level of accuracy. Applying object detection with machine learning in real time to pick up on weapons or large bags could then be used to alert the security guards to go after a suspect.

The YOLO model itself comes in two main forms—a tiny and a full model. For the scope of this example, we will be using the smaller of the models (~60 MB) that can classify 20 objects found within an image. The tiny model is comprised of nine convolutional layers and six max-pooling layers. The full model can classify thousands of objects and, given the proper hardware (namely, graphics processing units (GPUs)), can run faster than real-time.

The following diagram depicts how the YOLO model works (and neural networks, to a degree):

Effectively, the image (or images) is converted to 3 x 416 x 416 images. The 3 component represents the Red-Green-Blue (RGB) values. Consider the darkest layer as the red one, and the green layer as the lightest. The 416 values represent the width and height of the resized image. This input layer is then inputted into the hidden layers of the model. For the Tiny YOLO v2 model that we are using in this chapter, there are a total of 15 layers before outputting the layer.

To deep dive further into the YOLO model, please read this paper: https://arxiv.org/pdf/1612.08242.pdf.

Creating the ONNX object detection application

As mentioned earlier, the application we will be creating is an object detection application using a pre-trained ONNX model. Using the application we developed in Chapter 12, Using TensorFlow with ML.NET as a starting point, we will add in support for bounding boxes overlaid on top of the image when the model categorizes objects of which it is aware. The usefulness of this to the general public is in the various applications image object detection provides. Imagine that you are working on a project for the police or intelligence community, where they have images or videos and want to detect weapons. Utilizing the YOLO model with ML.NET, as we are going to show, would make that process very easy.

As with previous chapters, the completed project code, pre-trained model, and project files can be downloaded here: https://github.com/PacktPublishing/Hands-On-Machine-Learning-With-ML.NET/tree/master/chapter13.

Exploring the project architecture

Building on the project architecture and code we created in previous chapters, the architecture we will be reviewing is enhanced to be more structured and usable by an end user.

As in some of the previous chapters, the following two additional NuGet packages are required if you want to utilize an ONNX model and perform object detection:

Microsoft.ML.ImageAnalytics
Microsoft.ML.OnnxTransformer

These NuGet packages are already referenced in the included sample code. Version 1.3.1 of these packages is used in both the included example on GitHub and throughout this chapter's deep dive.

In the following screenshot, you will find the Visual Studio Solution Explorer view of the project. There are several new additions to the solution, to facilitate the production use case we are targeting. We will review in detail each of the new files in the following solution screenshot later on in this chapter:

Due to a current ML.NET limitation as of this writing, ONNX support is only provided for scoring using a pre-existing model. The pre-trained model included in this example can be found in the assets/model folder.

Diving into the code

For this application, as noted in the previous section, we are building on top of the work completed in Chapter 12, Using TensorFlow with ML.NET. While the user interface (UI) has not changed much, the underlying code to run an ONNX model has. For each file changed—as in previous chapters—we will review the changes made and the reasoning behind the changes.

Classes that were changed or added are as follows:

DimensionsBase
BoundingBoxDimensions
YoloBoundingBox
MainWindow.xaml
ImageClassificationPredictor
MainWindowViewModel

There is one additional file, with the YoloOutputParser class contained within. This class is derived from the Massachusetts Institute of Technology (MIT) licensed interface for the TinyYOLO ONNX model. Due to the length of this class, we will not review it; however, the code does read easily, and if you wish to step through a prediction, the flow will be easy to follow.

The DimensionsBase class

The DimensionsBase class contains the coordinates along with the Height and Width properties, as illustrated in the following code block:

public class DimensionsBase
{
    public float X { get; set; }

    public float Y { get; set; }
    
    public float Height { get; set; }
    
    public float Width { get; set; }
}

This base class is used by both the YoloOutputParser and BoundingBoxDimensions classes to reduce code duplication.

The YoloBoundingBox class

The YoloBoundingBox class provides the container class for what is used to populate our bounding boxes when generating them for the overlay, as illustrated in the following code block:

public class YoloBoundingBox
{
    public BoundingBoxDimensions Dimensions { get; set; }

    public string Label { get; set; }

    public float Confidence { get; set; }

    public RectangleF Rect => new RectangleF(Dimensions.X, Dimensions.Y, 
                                    Dimensions.Width, Dimensions.Height);

    public Color BoxColor { get; set; }
}

In addition, also defined in this same class file is our BoundingBoxDimensions class, as shown in the following code block:

public class BoundingBoxDimensions : DimensionsBase { }

Again, this inheritance is used to reduce code duplication.

The MainWindow.xaml file

The Extensible Application Markup Language (XAML) view of our application has been simplified to just the button and the image controls, as illustrated in the following code block:

<Grid>
    <Grid.RowDefinitions>
        <RowDefinition Height="Auto" />
        <RowDefinition Height="*" />
    </Grid.RowDefinitions>

    <Button Grid.Row="0" Margin="0,10,0,0" Width="200" Height="35" 
     Content="Select Image File" HorizontalAlignment="Center" 
     Click="btnSelectFile_Click" />

    <Image Grid.Row="1" Margin="10,10,10,10" 
     Source="{Binding SelectedImageSource}" />
</Grid>

In addition, due to the nature of the bounding boxes and images you may select, the window has defaulted to Maximized, as can be seen in the following code block:

<Window x:Class="chapter13.wpf.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc=
            "http://schemas.openxmlformats.org/markup-compatibility/2006"
        xmlns:local="clr-namespace:chapter13.wpf"
        mc:Ignorable="d"
        ResizeMode="NoResize"
        WindowStyle="SingleBorderWindow"
        WindowState="Maximized"
        WindowStartupLocation="CenterScreen"
        Background="#1e1e1e"
        Title="Chapter 13" Height="450" Width="800">

With the XAML changes behind us, let us now dive into the revised ImageClassificationPredictor class.

The ImageClassificationPredictor class

The ImageClassificationPredictor class, much like that of Chapter 12, Using TensorFlow with ML.NET, contains the methods to run our image prediction. In this chapter, we will need to make several additional class objects to support the running of an ONNX model, as follows:

First, we define the ImageNetSettings struct that defines the height and width of our network. The YOLO model requires the use of 416 pixels by 416 pixels, as illustrated in the following code block:

public struct ImageNetSettings
{
    public const int imageHeight = 416;
    public const int imageWidth = 416;
}

Next, we define the TinyYoloModelSettings struct to be used with the ONNX model, as follows:

public struct TinyYoloModelSettings
{
    public const string ModelInput = "image";

    public const string ModelOutput = "grid";
}

Unlike the previous chapter, where the TensorFlow model was imported and then exported as an ML.NET model on the first run, ONNX, as of this writing, does not support that path. So, we must load the ONNX model in the Initialize method every time, as illustrated in the following code block:

public (bool Success, string Exception) Initialize()
{
    try
    {
        if (File.Exists(ML_NET_MODEL))
        {
            var data = MlContext.Data.LoadFromEnumerable(
                                  new List<ImageDataInputItem>());

            var pipeline = MlContext.Transforms.LoadImages(
                outputColumnName: "image", imageFolder: "", 
                inputColumnName: nameof(
                    ImageDataInputItem.ImagePath))
                .Append(MlContext.Transforms.ResizeImages(
                    outputColumnName: "image", 
                    imageWidth: ImageNetSettings.imageWidth, 
                    imageHeight: ImageNetSettings.imageHeight, 
                    inputColumnName: "image"))
                .Append(MlContext.Transforms.ExtractPixels(
                    outputColumnName: "image"))
                .Append(MlContext.Transforms.ApplyOnnxModel(
                    modelFile: ML_NET_MODEL, 
                    outputColumnNames: new[] { 
                               TinyYoloModelSettings.ModelOutput},
                    inputColumnNames: new[] {
                               TinyYoloModelSettings.ModelInput}));

            _model = pipeline.Fit(data);

            return (true, string.Empty);
        }

        return (false, string.Empty);
    }
    catch (Exception ex)
    {
        return (false, ex.ToString());
    }
}

Next, we modify the Predict method extensively to support the YoloParser call, calling the DrawBoundingBox method to overlay the bounding boxes, and then returning the bytes of the updated image, as follows:

public byte[] Predict(string fileName)
{
    var imageDataView = MlContext.Data.LoadFromEnumerable(
        new List<ImageDataInputItem>{new ImageDataInputItem{
            ImagePath = fileName}});

    var scoredData = _model.Transform(imageDataView);

    var probabilities = scoredData.GetColumn<float[]>(
        TinyYoloModelSettings.ModelOutput);

    var parser = new YoloOutputParser();

    var boundingBoxes =
        probabilities
            .Select(probability => 
                parser.ParseOutputs(probability))
            .Select(boxes => 
                parser.FilterBoundingBoxes(boxes, 5, .5F));

    return DrawBoundingBox(fileName, 
        boundingBoxes.FirstOrDefault());
}

For brevity, the DrawBoundingBox method is not shown here. At a high level, the original image is loaded into memory, and the model's bounding boxes are then drawn on top of the image, along with the label and confidence. This updated image is then converted to a byte array and returned.

The MainWindowViewModel class

Inside the MainWindowViewModel class, there are a couple of changes to be made due to the nature of the example. We look at them here:

First, the LoadImageBytes method now simply takes the parsed image bytes and converts them to an Image object, like this:

private void LoadImageBytes(byte[] parsedImageBytes)
{
    var image = new BitmapImage();

    using (var mem = new MemoryStream(parsedImageBytes))
    {
        mem.Position = 0;

        image.BeginInit();
        
        image.CreateOptions = 
            BitmapCreateOptions.PreservePixelFormat;
        image.CacheOption = BitmapCacheOption.OnLoad;
        image.UriSource = null;
        image.StreamSource = mem;
        
        image.EndInit();
    }

    image.Freeze();

    SelectedImageSource = image;
}

Lastly, we modify the Classify method to call the LoadImageBytes method upon successfully running the model, as follows:

public void Classify(string imagePath)
{
    var result = _prediction.Predict(imagePath);

    LoadImageBytes(result);
}

With the changes in place for the Classify method, that concludes the code changes required for this chapter's example. Now, let us run the application!

Running the application

To run the application, the process is identical to the sample application in Chapter 12, Using TensorFlow with ML.NET. To run the application from within Visual Studio, simply click the play icon found in the toolbar, as illustrated in the following screenshot:

After launching the application, just as in Chapter 12, Using TensorFlow with ML.NET, select an image file, and the model will run. For example, I selected an image I took on a vacation to Germany (note the car's bounding boxes), shown in the following screenshot:

Feel free to try selecting images you have on your hard drive to see the confidence level of the detection and how well the bounding boxes are formed around the objects.

Exploring additional production application enhancements

Now that we have completed our deep dive, there are a couple of additional elements to further enhance the application. A few ideas are discussed in the upcoming sections.

Logging

As noted previously, the importance of logging cannot be stressed enough within desktop applications. Logging utilizing NLog (https://nlog-project.org/) or a similar open-source project is highly recommended as your application complexity increases. This will allow you to log to a file, console, or third-party logging solution such as Loggly, at varying levels. For instance, if you deploy this application to a customer, breaking down the error level to at least Debug, Warning, and Error will be helpful when debugging issues remotely.

Image scaling

As you might have noticed, with images that are quite large (those exceeding your screen resolution), the text labeling of the bounding boxes and resizing within the image preview is not as easy to read as for, say, a 640 x 480 image. One area of improvement here would be to provide hover-over capabilities, resizing the image to the dimensions of the window or increasing the font size dynamically.

Utilizing the full YOLO model

In addition, another area of improvement for this sample would be to use the full YOLO model within an application. As previously noted with the Tiny YOLO model used within the example application, only 20 labels are provided. In a production application or one in which you wish to build on, using the larger, more complex model would be a good choice.

You can download the full YOLO model here: https://github.com/onnx/models/tree/master/vision/object_detection_segmentation/yolov3.

Summary

Over the course of this chapter, we have deep dived into what goes into the ONNX format and what it offers to the community. In addition, we also created a brand new detection engine using the pre-trained Tiny YOLO model in ML.NET.

And with that, this concludes your deep dive into ML.NET. Between the first page of this book and this one, you have hopefully grown to understand the power that ML.NET offers in a very straightforward feature-rich abstraction. With ML.NET constantly evolving (much like .NET), there will be no doubt about the evolution of ML.NET's feature sets and deployment targets, ranging from embedded Internet of Things (IoT) devices to mobile devices. I hope this book was beneficial for your deep dive into ML.NET and machine learning. In addition, I hope that as you approach problems in the future, you will first think about whether the problem would benefit from utilizing ML.NET to solve the problem more efficiently and, potentially, better overall. Given the world's data continually growing at exponential rates, the necessity for using non-brute-force/traditional approaches will only continue to grow, therefore the skills garnered from this book should help you for years to come.

Table of Contents for Using ONNX with ML.NET

Create new playlist

Sign In

Sign Up