© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
A. GladstoneC++ Software Interoperability for Windows Programmershttps://doi.org/10.1007/978-1-4842-7966-3_8

8. Module Development with Boost.Python and PyBind

Adam Gladstone1  
(1)
Madrid, Spain
 

Introduction

In the previous chapter, we saw how to create a basic Python extension module. We added code to expose functionality from the underlying C++ library of statistical functions. We saw how to perform the conversion between PyObject pointers and native C++ types. While not especially difficult, we saw that it is potentially error prone. In this chapter, we consider two frameworks – Boost.Python and PyBind – that overcome these difficulties, making the development of Python extension modules easier. We build two quite similar wrapper components, the first based on Boost.Python and the second on PyBind. The intention here is to compare the two frameworks. Following this, we look at a typical Python client and develop a script to measure the relative performance of the extension modules. We end the chapter with a simple Flask app that demonstrates using our PyBind module as part of a (limited) statistics service.

Boost.Python

The Boost Python Library is a framework for connecting Python to C++. It allows us to expose C++ classes, functions, and objects to Python in a non-intrusive way using types provided by the framework. We can continue to write “regular” C++ code in the wrapper layer using the types provided. The Boost Python Library is extensive. It provides support for automatic conversion of Python types to Boost types, function overloading, and exception translation, among other things. Using Boost.Python allows us to manipulate Python objects easily in C++, simplifying the syntax when compared to a lower-level approach such as the one we saw in the previous chapter.

Prerequisites

In addition to an installation of Boost (we use Boost 1.76 for this project), we require a built version of the libraries. Specifically, we need the Boost Python library. Boost.Python is not a header-only library unlike most of the Boost library functionality, so we need to build it. Moreover, we need to ensure that when we build the libraries, the version of the Boost.Python library is consistent with the version of Python we are targeting. We have been using Python 3.8, so we expect the following Boost libraries to be present:
  • oost_1_76_0stageliblibboost_python38-vc142-mt-gd-x32-1_76.lib

  • oost_1_76_0stageliblibboost_python38-vc142-mt-x32-1_76.lib

  • oost_1_76_0stageliblibboost_python38-vc142-mt-gd-x64-1_76.lib

  • oost_1_76_0stageliblibboost_python38-vc142-mt-x64-1_76.lib

The Boost installation and build process for these libraries are described in more detail in Appendix A.

Project Settings

The StatsPythonBoost project is a standard Windows DLL project. As before, the project references the StatsLib static library. The project settings are summarized in Table 8-1.
Table 8-1

Project settings for StatsPythonBoost

Tab

Property

Value

General

C++ Language Standard

ISO C++17 Standard (/std:c++17)

C/C++ > General

Additional Include Directories

<Usersuser>Anaconda3include

$(BOOST_ROOT)

$(SolutionDir)Commoninclude

Linker > General

Additional Library Directories

<Usersuser>Anaconda3libs

$(BOOST_ROOT)stagelib

Build Events > Post-Build Event

Command Line

(see in the following)

We can see from Table 8-1 that the project settings are similar to the previous project. In this case, we have not renamed the target output. We leave this for the post-build script (see in the following). In the Additional Include Directories, we reference the location of Python.h and the StatsLib project include directory. In addition, we reference the Boost libraries with $(BOOST_ROOT) macro. Similarly, in the Additional Library Directories, we add a reference to both the Python libs and the Boost libs.

As in the previous project, we take a shortcut. Rather than installing the library in the Python environment, we simply copy the output to our Python project location (StatsPython). From there we can import the library in a Python script or interactively. In the post-build event, we copy the dll to the script directory, delete the previous version, and rename the dll with a .pyd extension, as follows:
copy /Y "$(OutDir)$(TargetName)$(TargetExt)" "$(SolutionDir)StatsPython$(TargetName)$(TargetExt)"
del "$(SolutionDir)StatsPython$(TargetName).pyd"
ren "$(SolutionDir)StatsPython$(TargetName)$(TargetExt)" "$(TargetName).pyd"

With these settings in place, everything should build without warnings or errors.

Code Organization

The Visual Studio Community Edition 2019–generated project for a Windows dll generates a handful of files that we ignore. We ignore the dllmain.cpp file (which contains the entry point for a standard Windows dll). We also ignore the files framework.h and pch.cpp (except insofar as it includes pch.h, the precompiled header).

In the pch.h file, we have
#define BOOST_PYTHON_STATIC_LIB
#include <boost/python.hpp>

The macro indicates that in this dll module, we are statically linking to Boost Python:

oost_1_76_0stageliblibboost_python38-vc142-mt-...-...-1_76.lib

The “...” depend on the specific processor architecture, though in our case we target only x64. The second line brings in all the Boost Python headers. The rest of the code is organized as before into three main areas: the functions (Functions.h/Functions.cpp), the conversion layer (Conversion.h/Conversion.cpp), and the module definition. In addition, for this project, we have a wrapper class StatisticalTests.h/StatisticalTests.cpp that wraps up the t-test functionality. We will deal with each of these areas in turn.

Functions

Inside the API namespace we declare two functions: DescriptiveStatistics and LinearRegression. Both functions take the corresponding boost::python arguments. Boost.Python comes with a set of derived object types corresponding to those of Python’s:

    Python type        Boost type
  • list        boost::python::list

  • dict        boost::python::dict

  • tuple    boost::python::tuple

  • str        boost::python::str

This makes converting to STL types quite straightforward, as we shall see. The code inside the functions is also straightforward. We first convert the parameters to types usable by the StatsLib. Then we call the underlying C++ function, collect the results, and translate these back into a form Python understands. The Boost.Python library makes this very straightforward and flexible. Listing 8-1 shows the implementation of the DescriptiveStatistics function.
-
Listing 8-1

The DescriptiveStatistics wrapper function

The DescriptiveStatistics function in Listing 8-1 should look familiar. It follows the same structure as the raw Python example in the previous chapter. The major difference in the function declaration is that instead of PyObject pointers, we can use types defined in the Boost.Python library. In this case, both parameters are passed in as const references to a boost::python::list. The second parameter is defaulted, as we want to be able to call DescriptiveStatistics with or without the keys. The input arguments are converted to a std::vector<double> and a std::vector<std::string>, respectively. These are then used in the call to the underlying statistical library function. The results package is returned as before (a std::unordered_map<std::string, double> type) and converted to a boost::python::dict.

Listing 8-2 shows the code for the LinearRegression function.
-
Listing 8-2

The LinearRegression wrapper function

As can be seen from Listing 8-2, the LinearRegression function follows the same structure as previously. The function takes in two lists, converts them into the corresponding datasets, calls the underlying function, and converts the results package into a Python dictionary.

StatisticalTests

Inside the API namespace, we create a separate namespace StatisticalTests for the three statistical hypothesis test functions. As in the “raw” case, here we have initially chosen to wrap up the usage of the TTest class inside a function. Listing 8-3 shows the summary data t-test function.
-
Listing 8-3

Wrapping up the TTest class in a function

As shown in Listing 8-3, the approach of providing a procedural wrapper for a class is straightforward: we get the input data and create an instance of the TTest class (depending on the function call and the arguments). We then call Perform to do the calculation and Results to retrieve the results. These are then translated back to the Python caller. The SummaryDataTTest function in this example takes four parameters corresponding to the constructor arguments of the summary data t-test. The arguments are typed as const references to a boost::python::object. This provides a wrapper around PyObject. The function then makes use of boost::python::extract<T>(val) to get a double value out of the argument. In general, the syntax is cleaner and more direct than using PyArg_ParseTuple. The remainder of the function calls Perform and retrieves the Results. As in the previous case of DescriptiveStatistics and LinearRegression, these are converted to a boost::python::dict and returned to the caller.

The Conversion Layer

As we have seen earlier, for the built-in types (bool, int, double, and so on) we can use one of the templated extract functions:
boost::python::extract<T>(val).
For conversion to the STL types, we have three inline’d functions. The first is a template function to_std_vector. This converts from a boost::python::object representing a list to a std::vector<T>. Listing 8-4 shows the code.
-
Listing 8-4

Converting a boost::python::object list to a std::vector

Listing 8-4 starts by constructing an empty std::vector. Then, we iterate over the input list extracting the individual values and inserting them into the vector. We use this basic approach to illustrate accessing list elements in a standard manner. We could have used the boost::python::stl_input_iterator<T> to construct the results vector<T> directly from iterators. We use this function to convert a list of doubles to a vector of doubles and also to convert a list of string keys to a vector of strings.

The second function is to_dict. This is a specialized function used for converting the results set into a Python dictionary. Listing 8-5 shows the code.
-
Listing 8-5

Converting the results package to a Python dictionary

In this case, we input a const reference to a std::unordered_map<std::string, double> and return the contents into a boost::python::dict by simply iterating over the results. The final function is to_list. This is similar to the previous to_dict function. In this case, we create a Python list and populate it from a vector of doubles.

The Module Definition

Our Boost.Python module is defined in module.cpp. The module definition comprises both the functions and the classes that we want to expose to Python. We will deal with each in turn. The listing is quite long so has been broken up into two sections. First, Listing 8-6a shows the code that exposes the functions.
-
Listing 8-6a

The functions: StatsPythonBoost module definition

In Listing 8-6a, this part of the module definition should look somewhat familiar. It is not very different from the “raw” approach we saw in the previous chapter. We use the boost::python::def function to declare the functions we are wrapping. The first parameter is the function name we want to call from Python. The second parameter is the function address. The final parameter is the docstring. As pointed out earlier for the DescriptiveStatistics function, we want to be able to call it from Python with and without keys, and have it behave as the following interactive session demonstrates:
>>> import StatsPythonBoost as Stats
>>> data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> results = Stats.DescriptiveStatistics(data)
>>> print(results)
{'Mean': 4.5, 'Count': 10.0, 'Kurtosis': -1.2000000000000002, 'Skew.P': 0.0, ... }
>>> keys = ['Mean', 'StdDev.P']
>>> results = Stats.DescriptiveStatistics(data, keys)
>>> print(results)
{'Mean': 4.5, 'StdDev.P': 2.8722813232690143}

In order to do this, we need two separate overloaded functions. This is the same approach that we used in the C++/CLI wrapper in Chapter 3. In this case, however, we do not need to explicitly write the overloads. We make use of the macro BOOST_PYTHON_FUNCTION_OVERLOADS to generate the overloads for us. The arguments are the generator name, the function we want to overload, the minimum number of parameters (1 in this case), and the maximum number of parameters (2 in this case). Having defined this, we then pass the f_overloads structure , along with the docstring, to the def function.

The second part of the module definition, shown in Listing 8-6b, declares the classes that can be used directly in Python.
-
Listing 8-6b

The classes: StatsPythonBoost module definition

Listing 8-6b shows the TTest and DataManager classes that we wrap in this module. With these classes defined, we can write the following from a Python script, for example:
# Perform t-test from summary data
t: Stats.TTest = Stats.TTest(5, 9.261460, 0.2278881e-01, 195)
t.Perform()
print(t.Results())

The C++ wrapper class for the t-test is defined in StatisticalTests.h. The class template argument references our wrapper class. In this case, we have named it StudentTTest to distinguish it from the underlying Stats::TTest class. This class holds an instance of the underlying Stats::TTest class. The constructors determine the type of t-test to be performed and convert between boost::python types and the underlying C++ types, using the same conversions that we have seen.

From the module definition in Listing 8-6b, we can see that the first parameter is the name of the class, "TTest". This is the name for the type we will call from Python. Alongside this, we define an init function (the constructor) which takes four arguments. We then define two additional init functions, one each for the remaining constructors with their corresponding arguments. Finally, we define the two functions Perform and Results. All the functions provide a docstring. That is all we need to do to expose a native C++ type to Python.

The DataManager class is exposed in a similar way. The C++ wrapper class is defined in DataManager.h in the namespace API::Data. This allows us to keep the wrapper class separate from the StatsLib C++ class of the same name. As before, the purpose of the wrapper class is to handle the type conversions and manage the lifetime of the underlying DataManager class in the StatsLib. Listing 8-7 shows a typical example function.
-
Listing 8-7

The DataManager::ListDataSets function

From Listing 8-7 we can see that the function ListDataSets returns a Python list using the Boost.Python type. The list comprises Stats::DataSetInfo items that are typed as
using DataSetInfo = std::pair<std::string, std::size_t>;

The items contain the dataset name and the number of observations in the data. The function first obtains the currently loaded datasets from the m_manager member that this class wraps. Inside the for-loop, we use the function boost::python::make_tuple to create a Python tuple element with the dataset information. This is then appended to the results list and returned to the caller. The remaining functions are similarly straightforward.

Exception Handling

As in the previous chapter, exceptions should be handled and processed from the wrapper functions. In particular, we are concerned with bad arguments, so we should check types and report exceptions appropriately. We could use the same approach that we used in the previous chapter (manually translating C++ exceptions to Python exceptions). However, we can also take advantage of Boost.Python. In the module definition, the Boost.Python framework wraps our functions in the call to .def(...) so they are not called directly via Python. Instead, Python calls function_call(...) (oost_1_76_0libspythonsrcobjectfunction.cpp). This function wraps the actual function call in an exception handler. The exception handler handles the exception in the way that we did previously (oost_1_76_0libspythonsrcerrors.cpp), though it catches and translates more exception types. This means Python does not halt and the exception is handled gracefully. We can test this out using the following Python code which passes in a string inside a list instead of the expected numeric item:
try:
    x = [1, 3, 5, 'f', 7]
    summary: dict = Stats.DescriptiveStatistics(x)
    print(summary)
except Exception as inst:
    report_exception(inst)
The error that is reported is
<class 'TypeError'>
No registered converter was able to produce a C++ rvalue of type double from this Python object of type str
This error is provided by Boost. On the other hand, if we pass in an empty dataset, we get the following:
try:
    x = []
    summary: dict = Stats.DescriptiveStatistics(x)
    print(summary)
except Exception as inst:
    report_exception(inst)
The error that is reported is
<class 'ValueError'> The data is empty.

This is the error that is thrown from the underlying StatsLib. Basically, the same error handling that we wrote in the previous chapter is now provided for free.

PyBind

In this section, we develop our third and final Python extension module. This time we use PyBind. Boost.Python has been around for a long time and the Boost library that it is a part of offers a wide range of functionality. This makes it a relatively heavyweight solution if all we want to do is create Python extension modules. PyBind is a lighter-weight alternative. It is a header-only library that provides an extensive range of functions to facilitate writing C++ extension modules for Python. PyBind is available from here: https://github.com/pybind/pybind11.

Prerequisites

The only prerequisite for this section is to install PyBind into your Python environment. You can use either pip install pybind from a command prompt. Or you can download the wheel (https://pypi.org/project/pybind11/#files) and run pip install "pybind11-2.7.0-py2.py3-none-any.whl".

Project Settings

The StatsPythonPyBind project is setup in a similar way to the previous one. It is a standard Windows DLL project. The project settings are summarized in Table 8-2.
Table 8-2

Project settings for StatsPythonPyBind

Tab

Property

Value

General

C++ Language Standard

ISO C++17 Standard (/std:c++17)

C/C++ > General

Additional Include Directories

<Usersuser>Anaconda3include

<Users>AppDataRoamingPythonPython37site-packagespybind11include

$(SolutionDir)Commoninclude

Linker > General

Additional Library Directories

<Usersuser>Anaconda3libs

$(BOOST_ROOT)stagelib

Build Events > Post-Build Event

Command Line

(see in the following)

We create a module as before, copied to the script directory and renamed .pyd. We use the following script:
del "$(SolutionDir)StatsPython$(TargetName).pyd"
copy /Y "$(OutDir)$(TargetName)$(TargetExt)" "$(SolutionDir)StatsPython$(TargetName)$(TargetExt)"

Additionally, we have removed the pch file and set the project setting to not using precompiled headers. Finally, we have added a reference to the StatsLib project in the project References. At this point, everything should build without warnings or errors.

Code Organization: module.cpp

In this project, there is only a single file, module.cpp. This file contains all the code. As we have seen in the previous section on Boost.Python and in the previous chapter as well, we have generally separated the conversion layer from the wrapped functions and classes. And we have separated these from the module definition. This was a convenient way to organize the code in the wrapper layer and allowed us to separate concerns (like converting types or calling functions) appropriately. However, PyBind simplifies both these aspects.

At the top of the file module.cpp we include the PyBind headers:
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>

This is followed by our StatsLib includes.

Previously, we have had to declare wrapper/proxy functions that take Python types as arguments (either PyObject or boost::python::object) and convert these to the underlying native C++ types. With PyBind, we don’t need to do this. We now have a single macro PYBIND11_MODULE that defines the module. The listing is quite long so we have divided it into three sections. The first section deals with the functions that we expose and the next two sections with the classes that we expose. The functions that we expose are shown in Listing 8-8a.
-
Listing 8-8a

The function definitions in the StatsPythonPyBind module

The PYBIND11_MODULE macro defines the module name StatsPythonPyBind that is used by Python in the import statement. Inside the module definition, we can see the declarations of the DescriptiveStatistics and LinearRegression functions. The .def(...) function is used to define an exported function. Just as before, we give it a name that is called from Python and the final parameter which is a docstring.

However, unlike previously, we do not require a separate wrapper function. We can simply provide the underlying function address. This is the second parameter. The translation of both the parameters and the return type is handled by the PyBind framework. In the case of the Stats::GetDescriptiveStatistics function, which has a second default argument, we can provide further information about the argument structure. Specifically, PyBind allows us to specify the arguments and the default values if required, so we add the arguments after the function address, py::arg("data") and py::arg("keys") defaulted with the required value. Following this, the three functions SummaryDataTTest, OneSampleTTest, and TwoSampleTTest are now completely unnecessary. We have provided wrappers for illustration only. The code for the two-sample t-test wrapper is as follows:
std::unordered_map<std::string, double> TwoSampleTTest(const std::vector<double>& x1, const std::vector<double>& x2)
{
    Stats::TTest test(x1, x2);
    test.Perform();
    return test.Results();
}

What is important here is not how the function wraps the TTest class, but rather the fact that the wrapper function uses native C++ and STL types both for the function parameters and the return value. Using Boost.Python, we would have had to convert from/to boost::python::object. But here we no longer need to convert from Python types to C++ types. Of course, we can, if we wish, explicitly wrap functions. This is a design choice.

The second part of the module definition deals with the class definitions. The TTest class is shown in Listing 8-8b.
-
Listing 8-8b

The description of the TTest class exported to Python

Listing 8-8b shows how the TTest class from the underlying C++ StatsLib is exposed to Python. As in the case of Boost.Python, we describe the type “TTest” that we want to use. But, in this case, the template argument to the py::class_ object is the underlying Stats::TTest class. The class that is referenced is not a wrapper class, as was the case with Boost.Python. After the template arguments and the parameters passed to the constructor of py::class_, we use the .def function to describe the structure of the class. In this case, we declare the three TTest constructors with their respective arguments passed as template parameters to the py::init<> function. Again, it is worth highlighting that we do not need to do any conversions; we simply pass in native C++ types and STL types (rather than boost::python::object types). Finally, we declare the functions Perform and Results, and an anonymous function to return a string representation of the object to Python.

The definition of the DataManager class is equally straightforward. Listing 8-8c shows the class definition.
-
Listing 8-8c

The DataManager class definition

As we can see from Listing 8-8c, all we need to do in the .def function is to provide a mapping from the function names used by Python to the underlying C++ functions. Apart from the functions that are available in the DataManager class, we also have access to functions that form part of the definition of the Python class. For example, the DataManager extends the __repr__ function with a custom to_string function that outputs internal information regarding the dataset.

As we can see in this project, both the wrapper and the “conversion” layer are minimal. PyBind provides a wide range of facilities, allowing us to easily connect C++ code to Python. In this chapter, we have only just scratched the surface. There are a large number of features and we have only covered a fraction of them. Moreover, we are aware that we have really only written code for the most “vanilla” situations (taking advantage of the fact that PyBind allows us to do this easily).

However, while using PyBind makes exposing C++ classes and functions straightforward, we need to be aware that there is a lot going on under the hood. In particular, we need to be aware of the return value policies that can be passed to the module_::def() and the class_::def() functions. These annotations allow us to tune the memory management for functions that return a non-trivial type. In this project, we have only used the default policy return_value_policy::automatic. A full discussion of this topic is beyond the scope of this chapter. But, as the documentation points out, return value policies are tricky, and it’s important to get them right.1

If we take a step back for a moment, we can see that in terms of the module definition, both Boost.Python and PyBind provide us with a meta-language for defining Python entities. It might seem a complicated way to go. Arguably, writing equivalent classes in native Python is somewhat easier than using a meta-language to describe C++ classes. However, the approach we have adopted here, describing native C++ classes, clearly addresses a different issue, that is, it provides a (relatively) easy way to export classes out of C++ and have them managed in an expected way in a Python environment.

Apart from defining the functions and classes, we have also been careful to add documentation strings. This is useful, and we can see this information if we print out the help on the class. This is shown in Listing 8-9 for the StatsPythonPyBind module.
-
Listing 8-9

Output from the Python help function for the TTest class

Listing 8-9 shows the output from the StatsPythonPyBind module using the built-in help() function. We can see that it provides a description of the class methods and the class initialization along with the docstrings that we provided. It also provides detailed information both about the argument types used and the return types. We can see quite clearly how the declarative C++ class description has been translated into a Python entity. The output from StatsPythonBoost is similar, though not identical, and worthwhile comparing. As an alternative to the help function, we can use the inspect module to introspect on our Python extension. The inspect module provides additional useful functions to help get information about objects. This can be useful if you need to display a detailed traceback. As expected, we can retrieve all the information from our module except, of course, the source code. What both these approaches serve to illustrate is that, with a limited amount of C++ code, we have developed a proper Python object.

Exception Handling

As expected, the PyBind framework provides support for exception handling. C++ exceptions, std::exception and its subclasses, are translated into the corresponding Python exceptions and can be handled in a script or by the Python runtime. Using the first of the two examples that we used previously, the exception report from Python is as follows:
<class 'TypeError'>
DescriptiveStatistics(): incompatible function arguments. The following argument types are supported:
    1. (arg0: List[float]) -> Dict[str, float] Invoked with: [1, 3, 5, 'f', 7]

The exception handling provides sufficient information to determine the cause of the issue and processing can proceed appropriately. It is worth pointing out that PyBind’s exception handling capabilities go beyond simple translation of C++ exceptions. PyBind provides support for several specific Python exceptions. It also supports registering custom exception handlers. The details are covered in the PyBind documentation.

The Python “Client”

Now that we have built a working PyBind module, it would be good to try out some of the functionality. We could of course have created a full-featured Python application. But we prefer to keep things simple and focused. As on previous occasions, we are concerned not just with exercising the underlying functionality, but also with interoperating with other Python components. Unlike in previous chapters, we have not written dedicated unit tests using one of the (several) Python testing frameworks. Instead, we use a simple Python script StatsPython.py that extends the basic script which we used in the previous chapter. We use the alias Stats as a simple expedient:
import StatsPythonPyBind as Stats
#import StatsPythonBoost as Stats

This allows us to easily switch between the Boost.Python extension module and the PyBind extension module. This is not proposed as a general approach, it just facilitates testing the functions and classes here.

The script itself defines functions that exercise the underlying StatsLib functionality. It also allows us to do a simple side-by-side test of the TTest class, for example. Listing 8-10 shows the function run_statistical_tests2.
-
Listing 8-10

A simple function to compare the results from two t-tests

In Listing 8-10, the function takes as inputs two Pandas data frame objects (simple datasets loaded from csv files) and converts them to lists, the type our Python interface to StatsLib expects. The first call uses the procedural interface. The second identical call constructs an instance of the TTest class that we declared and calls the functions Perform and Results. Both approaches produce the same results, unsurprisingly.

Performance

One of the reasons for trying to connect C++ and Python is the potential for performance gains from C++ code. To this end, we have written a small script, PerformanceTest.py. We want to test the performance of the mean and (sample) standard deviation functions. We would like to do this for Python vs. PyBind computing Mean and StdDev for 500k items.

From the Python side we have two approaches. Firstly, we define the functions mean, variance, and stddev. The implementations of these only use basic Python functionality. We also define the same functions, this time using the Python statistics library. This allows us to have two different baselines.

From the C++ side, we make a minor adjustment to the PyBind module definition so that we can expose the functions Mean and StandardDeviation from the StatsLib. In the case of the Mean function, this is quite straightforward to do. The functions exist in the Stats::DescriptiveStatistics namespace and are defined in the static library. Using the PyBind wrapper, StatsPythonPyBind, all we need to do is to add the description shown in Listing 8-11 to the module definition.
-
Listing 8-11

Enhancing the module definition with additional C++ functions

In Listing 8-11, we add the function "Mean", supply the address of the C++ implementation, and add the documentation string.

The StandardDeviation function is slightly more involved. The underlying C++ function takes two parameters, a std::vector<double> and an enumeration for the VarianceType . If we just pass the function address to the module definition, we will get a runtime error from Python as the function expects two arguments. To address this, we need to extend the code. At this point we have a choice. We can either write a small wrapper function that provides a hardcoded VarianceType argument or we can expose the VarianceType enumeration. We’ll look at both approaches.

First off, we look at writing a small wrapper function. Listing 8-12 shows this approach.
-
Listing 8-12

Wrapper for the underlying StandardDeviation function

Wrapping a function with a hardcoded argument is not exactly ideal, but it is simple. In the module definition, we add the following declaration shown in Listing 8-13.
-
Listing 8-13

Definition of the SampleStandardDeviation wrapper function

In Listing 8-13, we use the name “StdDevS” to reflect the fact that we are requesting the sample standard deviation. Now we can use this function in our performance test.

An alternative to writing the wrapper function is to expose the VarianceType enumeration to Python. If we do this, then we could call the function as follows:
>>> data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> Stats.StdDev(data, Stats.VarianceType.Sample)
3.0276503540974917
>>> Stats.StdDev(data, Stats.VarianceType.Population)
2.8722813232690143
To accomplish this in the code, we need to make two small changes. First, we describe the enumeration in the module. This is shown in Listing 8-14.
-
Listing 8-14

Defining the enumeration for VarianceType

In Listing 8-14, we use the PyBind py::enum_ class to define the enumeration VarianceType and give it a name. Notice that we have “attached” the enum to the module context (the m parameter in the py::enum_ function) in this case as it is not part of a class, for example. We then add appropriate strings for the corresponding values. A more detailed description of py::enum_ is given in the PyBind documentation. We also need to make a small modification to the way the function is defined in the module to reflect the fact that it is expecting two parameters. This is shown in Listing 8-15.
-
Listing 8-15

Defining additional arguments for the StdDev function

In Listing 8-15, we have added two py::arg structures to the function definition. This is similar to the way in which we handled the second optional argument for the GetDescriptiveStatistics function. The code compiles without warnings or errors. We can test that it works as expected using the Python interactive shell, as follows :
>>> Stats.VarianceType.__members__
{'Sample': <VarianceType.Sample: 0>, 'Population': <VarianceType.Population: 1>}

With these modifications in place, we can return to the performance test. The PerformanceTest.py script is straightforward. We import the required libraries, including StatsPythonPyBind. We define two versions of both mean and stddev in Python. One version doesn’t use the statistics library and the second version does. This just facilitates the comparison between Python functions and our library functions. We add a simple test function that uses random data and returns the mean and stddev with timing information.

These are the results we obtain (running the pure Python functions, rather than the Python statistics library which we might reasonably expect to perform faster):
Running benchmarks with COUNT = 500000
[mean(x)] (Python implementation) took 0.005 seconds
[stdev(x)] (Python implementation) took 3.003 seconds
[Mean(x)] (C++ implementation) took 0.182 seconds
[StdDevS(x)] (C++ implementation) took 0.183 seconds

The Python function mean(x) is about two orders of magnitude faster than the native C++ function. Changing the C++ code to use a for-loop instead of std::accumulate made no significant difference. It might be interesting to investigate if the latency in the C++ side is due to the conversion layer or simply unnecessary copying of vectors. Nevertheless, the native C++ StdDev function is substantially faster than either of the Python variants.

The Statistics Service

In the StatsPython project , there is a script StatsService.py that launches a small Flask app. The Flask app is a simple demonstration of a web service. It is extremely limited and only allows the user to compute a summary data t-test. The main page is shown in Figure 8-1.
Figure 8-1

The Stats Service main page

The main page consists of a simple form that allows the user to input the parameters of a summary data t-test. After pressing the Submit button, we compute the required values and return them, as shown in Figure 8-2.
Figure 8-2

Results from a summary data t-test

To run the service, open the StatsPython project in VSCode, for example. From the terminal, type
> py .StatsService.py

This starts the Flask service on port 5000. In your browser address bar, go to http://localhost:5000/. This points to the Summary Data T-Test page which is the main page for this app. Fill in the required details and press submit. The results are returned as expected, using the underlying TTest class from the StatsPythonPyBind module.

Apart from the small amount of code required to get this up and running, what is worth emphasizing is what we have achieved in terms of a multi-language development infrastructure. We have got an infrastructure that allows us to develop and adapt native C++ code, build this into a library, incorporate the library into a Python module, and have this functionality available for use in a Python web service. This flexibility is valuable when developing software systems.

Summary

In this chapter, we have built Python modules using the frameworks provided by Boost.Python and PyBind. Both modules exposed the functionality of the underlying library of statistical functions in a similar way. We have seen that both frameworks do a lot of work on our behalf both in terms of type conversions and also error handling. Furthermore, both frameworks allow us to expose native C++ classes to Python. We concluded this chapter by looking at measuring the performance of the underlying C++ function calls vs. the Python equivalents. The potential for performance enhancements is an obvious reason for connecting C++ to Python. However, equally compelling as a reason for connecting C++ to Python (if not more so) is that it gives us access to a wide variety of different Python libraries covering everything from machine learning (NumPy and Pandas, for example) to web services (Django and Flask, for example) and more. As we have seen, being able to expose functionality written in C++ to Python with minimal effort gives you a useful additional architectural choice when developing loosely coupled software systems.

Additional Resources

The links that follow provide more in-depth coverage of the topics dealt with in this chapter.

Exercises

The exercises in this section deal with exposing the same functionality as previously, but this time via the Boost.Python module and the PyBind module.

The following exercises use the StatsPythonBoost project:

1) In StatsPythonBoost, add procedural wrappers for the z-test functions. These should be almost identical to the t-test functions. No additional conversion functions are required.
  • In StatisticalTests.h, add these declarations for the three functions:
    boost::python::dict SummaryDataZTest(const boost::python::object& mu0, const boost::python::object& mean, const boost::python::object& sd, const boost::python::object& n);
    boost::python::dict OneSampleZTest(const boost::python::object& mu0, const boost::python::list& x1);
    boost::python::dict TwoSampleZTest(const boost::python::list& x1, const boost::python::list& x2);
  • In StatisticalTests.cpp, add the implementations of these functions. Follow the code for the t-test wrapper functions.

  • In module.cpp, add the three new functions to the module BOOST_PYTHON_MODULE(StatsPythonBoost) {}

  • After rebuilding StatsPythonBoost, open the StatsPython project in VSCode. Open the StatsPython.py script. Add functions to test the z-test functions using the data we have used previously. For example, we can add the following function:
    def one_sample_ztest() -> None:
        """ Perform a one-sample z-test """
        try:
            data: list = [3, 7, 11, 0, 7, 0, 4, 5, 6, 2]
            results = Stats.OneSampleZTest(3.0, data)
            print_results(results, "One-sample z-test.")
        except Exception as inst:
            report_exception(inst)
2) In the StatsPythonBoost project, add a MovingAverage function.
  • In Functions.h add the following declaration:
    boost::python::list MovingAverage(const boost::python::list& dates, const boost::python::list& observations, const boost::python::object& window);
  • In Functions.cpp:
    • Add #include "TimeSeries.h" to the top of the file.

    • Add the implementation: the function takes three non-optional arguments: a list of dates, a list of observations, and a window size.

    • Convert the inputs using the existing conversion functions and pass these to the constructor of the TimeSeries class.

    • Return the results using the Conversion::to_list function.

  • In module.cpp, add the new function:
    def("MovingAverage", API::MovingAverage, "Compute a simple moving average of size = window.");
  • Build StatsPythonBoost. It should build without warnings and errors. You should be able to test the MovingAverage function interactively, adapting the script we used previously.

  • Open the StatsPython project in VSCode. Open the StatsPython.py script. Add a function to test the moving average, including exception handling. Run the script, and debug if required.

3) In the StatsPythonBoost project, add a TimeSeries class that wraps the native C++ TimeSeries class and computes a simple moving average.

The steps required are as follows:
  • Add a TimeSeries.h and a TimeSeries.cpp file to the project. These will contain the wrapper class definition and implementation, respectively.

  • In TimeSeries.h, add the class declaration. For example:
    namespace API
    {
        namespace TS
        {
            // TimeSeries wrapper class
            class TimeSeries final
            {
            public:
    // Constructor, destructor, assignment operator and MovingAverage function
            private:
                Stats::TimeSeries m_ts;
            };
        }
    }
  • In TimeSeries.cpp, add the class implementation. The constructor converts the boost::python::list arguments to appropriate std::vector types. The MovingAverage function extracts the window size argument and forwards the call to the m_ts member. The results are returned using the Conversion::to_list() function.

  • In module.cpp, add the include file, and add the class declaration to BOOST_PYTHON_MODULE(StatsPythonBoost) as follows:
    // Declare the TimeSeries class
    class_<API::TS::TimeSeries>("TimeSeries",
    init<const list&, const list&>("Construct a time series from a vector of dates and observations."))
    .def("MovingAverage", &API::TS::TimeSeries::MovingAverage,
            "Compute a simple moving average of size = window.")
        ;
  • After rebuilding StatsPythonBoost, open the StatsPython project in VSCode. Open the StatsPython.py script. Add a function to test the moving average, including exception handling. Run the script, debug if required.

The following exercises use the StatsPythonPyBind project:

4) Add procedural wrappers for the z-test functions. These should be almost identical to the t-test functions. No additional conversion functions are required.
  • In module.cpp, add declarations/definitions for the three functions.

  • In the module definition, add entries for these three functions. Follow the code for the t-test wrapper functions.

  • After rebuilding the StatsPythonPyBind project, open the StatsPython project in VSCode. Open the StatsPython.py script. Add functions to test the z-test functions using the data we have used previously.

5) Add a new class ZTest to PYBIND11_MODULE. Follow the definition of the TTest class , for example:
py::class_<Stats::ZTest>(m, "ZTest")
    .def(py::init<double, double, double, double>(), "...")
    .def(py::init<double, const std::vector<double>& >(), "...")
    .def(py::init<const std::vector<double>&, const std::vector<double>& >(), "...")
    .def("Perform", &Stats::ZTest::Perform, "...")
    .def("Results", &Stats::ZTest::Results, "...")
    .def("__repr__", [](const Stats::ZTest& a) {
                return "<example.ZTest>";
            }
    );
Note that in this case, no separate wrapper is required. We can simply reference the underlying native C++ class.
  • After rebuilding the StatsPythonPyBind project, open the StatsPython project in VSCode. Open the StatsPython.py script. Add functions to test the z-test functions using the data we have used previously. We can extend the function we used previously to test the one-sample z-test to test both the procedural wrapper and the class as follows:

def one_sample_ztest() -> None:
    """ Perform a one-sample z-test """
    try:
        data: list = [3, 7, 11, 0, 7, 0, 4, 5, 6, 2]
        results = Stats.OneSampleZTest(3.0, data)
        print_results(results, "One-sample z-test.")
        z: Stats.ZTest = Stats.ZTest(3.0, data)
        z.Perform()
        print_results(z.Results(), "One-sample z-test.(class)")
    except Exception as inst:
        report_exception(inst)

The results output from both calls should be identical.

6) In the StatsPythonPyBind project, add a MovingAverage function.
  • In module.cpp, add #include "TimeSeries.h".

  • In module.cpp, add a declaration/definition of the wrapper function.

std::vector<double> MovingAverage(const std::vector<long>& dates, const std::vector<double>& observations, int window)
{
        Stats::TimeSeries ts(dates, observations);
        const auto results = ts.MovingAverage(window);
        return results;
}
  • In module.cpp, add the definition of the MovingAverage function to the list of functions exposed by the PYBIND11_MODULE.

  • After rebuilding the StatsPythonPyBind project, open the StatsPython project in VSCode. Open the StatsPython.py script. Add a function to test the moving average, including exception handling. Run the script, debug if required.

7) Expose the native C++ TimeSeries class and the simple moving average function.
  • In module.cpp, add the include file, and add the class declaration. The class definition will be similar to the class definition we added to the StatsPythonBoost project previously.

py::class_<Stats::TimeSeries>(m, "TimeSeries")
    .def(py::init<const std::vector<long>&, const std::vector<double>&>(),
        "Construct a time series from a vector of dates and observations.")
    .def("MovingAverage", &Stats::TimeSeries::MovingAverage, "Compute a simple moving average of size = window.")
    .def("__repr__",
        [](const Stats::TimeSeries& a) {
            return "<TimeSeries> containing: " + to_string(a);
        }
);
To properly add the __repr__ method, we would need to adapt the underlying class definition to allow access to the internals or write an additional to_string() method. This is left as a further final exercise.
  • After rebuilding the StatsPythonPyBind project, open the StatsPython project in VSCode. Open the StatsPython.py script. Add a function to test the moving average, including exception handling. Run the script, debug if required.

It is worth emphasizing that exposing the ZTest class and the TimeSeries class using PyBind has been quite straightforward, compared to the amount of work required to expose wrappers either via CPython or the Boost.Python wrapper.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.182.66