Delving into ONNX format

ONNX format is a special file format used to share neural network architectures and parameters between different frameworks. It is based on the Google Protobuf format and library. The reason why this format exists is to test and run the same neural network model in different environments and on different devices. Usually, researchers use a programming framework that they know how to use in order to develop a model, and then run this model in a different environment for production purposes or if they want to share their model with other researchers or developers. This format is supported by all leading frameworks, such as PyTorch, TensorFlow, MXNet, and others. But now, there is a lack of support for this format from the C++ API of these frameworks and at the time of writing, they only have a Python interface for dealing with ONNX format. Some time ago, Facebook developed the Caffe2 neural network framework in order to run models on different platforms with the best performance. This framework also had a C++ API, and it was able to load and run models saved in ONNX format. Now, this framework has been merged with PyTorch. There is a plan to remove the Caffe2 API and replace it with a new combined API in PyTorch. But at the time of writing, the Caffe2 C++ API is still available as part of the PyTorch 1.2 (libtorch) library.

Usually, we, as developers, don't need to know how ONNX format works internally because we are only interested in files where the model is saved. Internally, ONNX format is a Protobuf formatted file. The following code shows the first part of the ONNX file, which describes how to use the ResNet neural network architecture for image classification:

 ir_version: 3
 graph {
     node {
         input: "data"
         input: "resnetv24_batchnorm0_gamma"
         input: "resnetv24_batchnorm0_beta"
         input: "resnetv24_batchnorm0_running_mean"
         input: "resnetv24_batchnorm0_running_var"
         output: "resnetv24_batchnorm0_fwd"
         name: "resnetv24_batchnorm0_fwd"
         op_type: "BatchNormalization"
         attribute {
             name: "epsilon"
             f: 1e-05
             type: FLOAT
         }
         attribute {
             name: "momentum"
             f: 0.9
             type: FLOAT
         }
         attribute {
             name: "spatial"
             i: 1
             type: INT
         }
     }
     node {
         input: "resnetv24_batchnorm0_fwd"
         input: "resnetv24_conv0_weight"
         output: "resnetv24_conv0_fwd"
         name: "resnetv24_conv0_fwd"
         op_type: "Conv"
         attribute {
             name: "dilations"
             ints: 1
             ints: 1
             type: INTS
         }
         attribute {
             name: "group"
             i: 1
             type: INT
         }
         attribute {
             name: "kernel_shape"
             ints: 7
             ints: 7
             type: INTS
         }
         attribute {
             name: "pads"
             ints: 3
             ints: 3
             ints: 3
             ints: 3
             type: INTS
         }
         attribute {
             name: "strides"
             ints: 2
             ints: 2
             type: INTS
         }
     }
     ...
 }

Usually, ONNX files come in binary format to reduce file size and increase loading speed.

Now, let's learn how to use the Caffe2 C++ API to load and run ONNX models. Unfortunately, there is only one available C++ library API that can be used to run models saved in ONNX format. This because Caffe2 can automatically convert them into its internal representation. Other libraries do such conversion in their Python modules. The ONNX community provides pre-trained models for the most popular neural network architectures in the publicly available Model Zoo (https://github.com/onnx/models). There are a lot of ready to use models that can be used to solve different ML tasks. For example, we can take the ResNet-50 model for image classification tasks (https://github.com/onnx/models/tree/master/vision/classification/resnet). For this model, we have to download the corresponding synset file with image class descriptions to be able to return classification results in a human-readable manner. The link to the file is https://github.com/onnx/models/blob/master/vision/classification/synset.txt.

To be able to use the Caffe2 C++ API, we have to use the following headers:

 #include <caffe2/core/init.h>
 #include <caffe2/onnx/backend.h>
 #include <caffe2/utils/proto_utils.h>

However, we still need to link our program to the libtorch.so library.

First, we need to initialize the Caffe2 library:

 caffe2::GlobalInit(&argc, &argv);

Then, we need to load the Protobuf model representation. This can be done with an instance of the onnx_torch::ModelProto class. To use an object of this class to load the model, we need to use the ParseFromIstream method, which takes the std::istream object as an input parameter. The following code shows how to use an object of the onnx_torch::ModelProto class:

 onnx_torch::ModelProto model_proto;
 {
     std::ifstream file(argv[1], std::ios_base::binary);
     if (!file) {
         std::cerr << "File " << argv[1] << "can't be opened
";
         return 1;
     }
     if (!model_proto.ParseFromIstream(&file)) {
         std::cerr << "Failed to parse onnx model
";
         return 1;
     }
 }

The caffe2::onnx::Caffe2Backend class should be used to convert the Protobuf ONNX model into an internal representation of Caffe2. This class contains the Prepare method, which takes the Protobuf formatted string, along with the model's description, a string containing the name of the computational device, and some additional settings (typically, these settings can be empty). The following code shows how to use the SerializeToString method of the onnx_torch::ModelProto class to make the model's string representation before we prepare the model:

 std::string model_str;
 if (model_proto.SerializeToString(&model_str)) {
     caffe2::onnx::Caffe2Backend onnx_backend;
     std::vector<caffe2::onnx::Caffe2Ops> ops;
     auto model = onnx_backend.Prepare(model_str, "CPU", ops);
     if (model != nullptr) {
     ...
     }
 }

Now that we've prepared the model for evaluation, we have to prepare input and output data containers. In our case, the input is a tensor of size 1 x 3 x 224 x 224, which represents the RGB image for classification. But the Caffe2 ONNX model takes a vector of caffe2::TensorCPU objects as input, so we need to move our image to the inputs vector. Caffe2 tensor objects are not copyable, but they can be moved. The outputs vector should be empty.

The following snippet shows how to prepare the input and output data for the model:

 caffe2::TensorCPU image = ReadImageTensor(argv[2], 224, 224);
 
 std::vector<caffe2::TensorCPU> inputs;
 inputs.push_back(std::move(image));
 
 std::vector<caffe2::TensorCPU> outputs(1);

The model is an object of the Caffe2BackendRep class, which uses the Run method for evaluation. We can use it in the following way:

model->Run(inputs, &outputs);

The output of this model is image scores (probabilities) for each of the 1,000 classes of the ImageNet dataset, which was used to train the model. The following code shows how to decode the model's output:

 std::map<size_t, std::string> classes = ReadClasses(argv[3]);
 for (auto& output : outputs) {
     const auto& probabilities = output.data<float>();
     std::vector<std::pair<float, int>> pairs;  // prob : class index
     for (auto i = 0; i < output.size(); i++) {
         if (probabilities[i] > 0.01f) {
             pairs.push_back(
             std::make_pair(probabilities[i], i + 1));  // 0 - background
         }
     }
     std::sort(pairs.begin(), pairs.end());
     std::reverse(pairs.begin(), pairs.end());
     pairs.resize(std::min(5UL, pairs.size()));
     for (auto& p : pairs) {
         std::cout << "Class " << p.second << " Label "
         << classes[static_cast<size_t>(p.second)] << " Prob "
         << p.first << std::endl;
  }

Here, we iterated over each output tensor from the outputs vector. In our case, there is only one item, but if we were to use several input images in the inputs vector, we would have several results. Then, we placed the score values and class indices in the vector of corresponding pairs. This vector was sorted by score, in descending order. Then, we printed five classes with the maximum score.

To access the elements of the caffe2::TensorCPU object, we used the data<float>() method, which returns the pointer to the const row-ordered floating-point values of the tensor. In this example, the output tensor had a dimension of 1x1000, so we accessed its values just like we did in the linear array.

To correctly finish the program, we have to shut down the Google protobuf library, which we used to load the required ONNX files:

 google::protobuf::ShutdownProtobufLibrary();

In this section, we looked at an example of how to deal with ONNX format in the PyTorch and Caffe2 libraries, but we still need to learn how load input images into Caffe2 tensor objects, which we use for the model's input.

Table of Contents for Delving into ONNX format

Create new playlist

Sign In

Sign Up

Table of Contents for
Delving into ONNX format