Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A. GladstoneC++ Software Interoperability for Windows Programmershttps://doi.org/10.1007/978-1-4842-7966-3_8

8. Module Development with Boost.Python and PyBind

Adam Gladstone¹

(1)

Madrid, Spain

Introduction

In the previous chapter, we saw how to create a basic Python extension module. We added code to expose functionality from the underlying C++ library of statistical functions. We saw how to perform the conversion between PyObject pointers and native C++ types. While not especially difficult, we saw that it is potentially error prone. In this chapter, we consider two frameworks – Boost.Python and PyBind – that overcome these difficulties, making the development of Python extension modules easier. We build two quite similar wrapper components, the first based on Boost.Python and the second on PyBind. The intention here is to compare the two frameworks. Following this, we look at a typical Python client and develop a script to measure the relative performance of the extension modules. We end the chapter with a simple Flask app that demonstrates using our PyBind module as part of a (limited) statistics service.

Boost.Python

The Boost Python Library is a framework for connecting Python to C++. It allows us to expose C++ classes, functions, and objects to Python in a non-intrusive way using types provided by the framework. We can continue to write “regular” C++ code in the wrapper layer using the types provided. The Boost Python Library is extensive. It provides support for automatic conversion of Python types to Boost types, function overloading, and exception translation, among other things. Using Boost.Python allows us to manipulate Python objects easily in C++, simplifying the syntax when compared to a lower-level approach such as the one we saw in the previous chapter.

Prerequisites

In addition to an installation of Boost (we use Boost 1.76 for this project), we require a built version of the libraries. Specifically, we need the Boost Python library. Boost.Python is not a header-only library unlike most of the Boost library functionality, so we need to build it. Moreover, we need to ensure that when we build the libraries, the version of the Boost.Python library is consistent with the version of Python we are targeting. We have been using Python 3.8, so we expect the following Boost libraries to be present:

oost_1_76_0stageliblibboost_python38-vc142-mt-gd-x32-1_76.lib
oost_1_76_0stageliblibboost_python38-vc142-mt-x32-1_76.lib
oost_1_76_0stageliblibboost_python38-vc142-mt-gd-x64-1_76.lib
oost_1_76_0stageliblibboost_python38-vc142-mt-x64-1_76.lib

The Boost installation and build process for these libraries are described in more detail in Appendix A.

Project Settings

The StatsPythonBoost project is a standard Windows DLL project. As before, the project references the StatsLib static library. The project settings are summarized in Table 8-1.

Table 8-1

Project settings for StatsPythonBoost

Tab	Property	Value
General	C++ Language Standard	ISO C++17 Standard (/std:c++17)
C/C++ > General	Additional Include Directories	<Usersuser>Anaconda3include $(BOOST_ROOT) $(SolutionDir)Commoninclude
Linker > General	Additional Library Directories	<Usersuser>Anaconda3libs $(BOOST_ROOT)stagelib
Build Events > Post-Build Event	Command Line	(see in the following)

We can see from Table 8-1 that the project settings are similar to the previous project. In this case, we have not renamed the target output. We leave this for the post-build script (see in the following). In the Additional Include Directories, we reference the location of Python.h and the StatsLib project include directory. In addition, we reference the Boost libraries with $(BOOST_ROOT) macro. Similarly, in the Additional Library Directories, we add a reference to both the Python libs and the Boost libs.

As in the previous project, we take a shortcut. Rather than installing the library in the Python environment, we simply copy the output to our Python project location (StatsPython). From there we can import the library in a Python script or interactively. In the post-build event, we copy the dll to the script directory, delete the previous version, and rename the dll with a .pyd extension, as follows:

copy /Y "$(OutDir)$(TargetName)$(TargetExt)" "$(SolutionDir)StatsPython$(TargetName)$(TargetExt)"

del "$(SolutionDir)StatsPython$(TargetName).pyd"

ren "$(SolutionDir)StatsPython$(TargetName)$(TargetExt)" "$(TargetName).pyd"

With these settings in place, everything should build without warnings or errors.

Code Organization

The Visual Studio Community Edition 2019–generated project for a Windows dll generates a handful of files that we ignore. We ignore the dllmain.cpp file (which contains the entry point for a standard Windows dll). We also ignore the files framework.h and pch.cpp (except insofar as it includes pch.h, the precompiled header).

In the pch.h file, we have

#define BOOST_PYTHON_STATIC_LIB

#include <boost/python.hpp>

The macro indicates that in this dll module, we are statically linking to Boost Python:

oost_1_76_0stageliblibboost_python38-vc142-mt-...-...-1_76.lib

The “...” depend on the specific processor architecture, though in our case we target only x64. The second line brings in all the Boost Python headers. The rest of the code is organized as before into three main areas: the functions (Functions.h/Functions.cpp), the conversion layer (Conversion.h/Conversion.cpp), and the module definition. In addition, for this project, we have a wrapper class StatisticalTests.h/StatisticalTests.cpp that wraps up the t-test functionality. We will deal with each of these areas in turn.

Functions

Inside the API namespace we declare two functions: DescriptiveStatistics and LinearRegression. Both functions take the corresponding boost::python arguments. Boost.Python comes with a set of derived object types corresponding to those of Python’s:

Python type Boost type

list boost::python::list
dict boost::python::dict
tuple boost::python::tuple
str boost::python::str

This makes converting to STL types quite straightforward, as we shall see. The code inside the functions is also straightforward. We first convert the parameters to types usable by the StatsLib. Then we call the underlying C++ function, collect the results, and translate these back into a form Python understands. The Boost.Python library makes this very straightforward and flexible. Listing 8-1 shows the implementation of the DescriptiveStatistics function.

Listing 8-1

The DescriptiveStatistics wrapper function

The DescriptiveStatistics function in Listing 8-1 should look familiar. It follows the same structure as the raw Python example in the previous chapter. The major difference in the function declaration is that instead of PyObject pointers, we can use types defined in the Boost.Python library. In this case, both parameters are passed in as const references to a boost::python::list. The second parameter is defaulted, as we want to be able to call DescriptiveStatistics with or without the keys. The input arguments are converted to a std::vector<double> and a std::vector<std::string>, respectively. These are then used in the call to the underlying statistical library function. The results package is returned as before (a std::unordered_map<std::string, double> type) and converted to a boost::python::dict.

Listing 8-2 shows the code for the LinearRegression function.

Listing 8-2

The LinearRegression wrapper function

As can be seen from Listing 8-2, the LinearRegression function follows the same structure as previously. The function takes in two lists, converts them into the corresponding datasets, calls the underlying function, and converts the results package into a Python dictionary.

StatisticalTests

Inside the API namespace, we create a separate namespace StatisticalTests for the three statistical hypothesis test functions. As in the “raw” case, here we have initially chosen to wrap up the usage of the TTest class inside a function. Listing 8-3 shows the summary data t-test function.

Listing 8-3

Wrapping up the TTest class in a function

As shown in Listing 8-3, the approach of providing a procedural wrapper for a class is straightforward: we get the input data and create an instance of the TTest class (depending on the function call and the arguments). We then call Perform to do the calculation and Results to retrieve the results. These are then translated back to the Python caller. The SummaryDataTTest function in this example takes four parameters corresponding to the constructor arguments of the summary data t-test. The arguments are typed as const references to a boost::python::object. This provides a wrapper around PyObject. The function then makes use of boost::python::extract<T>(val) to get a double value out of the argument. In general, the syntax is cleaner and more direct than using PyArg_ParseTuple. The remainder of the function calls Perform and retrieves the Results. As in the previous case of DescriptiveStatistics and LinearRegression, these are converted to a boost::python::dict and returned to the caller.

The Conversion Layer

As we have seen earlier, for the built-in types (bool, int, double, and so on) we can use one of the templated extract functions:

boost::python::extract<T>(val).

For conversion to the STL types, we have three inline’d functions. The first is a template function to_std_vector. This converts from a boost::python::object representing a list to a std::vector<T>. Listing 8-4 shows the code.

Listing 8-4

Converting a boost::python::object list to a std::vector

Listing 8-4 starts by constructing an empty std::vector. Then, we iterate over the input list extracting the individual values and inserting them into the vector. We use this basic approach to illustrate accessing list elements in a standard manner. We could have used the boost::python::stl_input_iterator<T> to construct the results vector<T> directly from iterators. We use this function to convert a list of doubles to a vector of doubles and also to convert a list of string keys to a vector of strings.

The second function is to_dict. This is a specialized function used for converting the results set into a Python dictionary. Listing 8-5 shows the code.

Listing 8-5

Converting the results package to a Python dictionary

In this case, we input a const reference to a std::unordered_map<std::string, double> and return the contents into a boost::python::dict by simply iterating over the results. The final function is to_list. This is similar to the previous to_dict function. In this case, we create a Python list and populate it from a vector of doubles.

The Module Definition

Our Boost.Python module is defined in module.cpp. The module definition comprises both the functions and the classes that we want to expose to Python. We will deal with each in turn. The listing is quite long so has been broken up into two sections. First, Listing 8-6a shows the code that exposes the functions.

Listing 8-6a

The functions: StatsPythonBoost module definition

In Listing 8-6a, this part of the module definition should look somewhat familiar. It is not very different from the “raw” approach we saw in the previous chapter. We use the boost::python::def function to declare the functions we are wrapping. The first parameter is the function name we want to call from Python. The second parameter is the function address. The final parameter is the docstring. As pointed out earlier for the DescriptiveStatistics function, we want to be able to call it from Python with and without keys, and have it behave as the following interactive session demonstrates:

>>> import StatsPythonBoost as Stats

>>> data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> results = Stats.DescriptiveStatistics(data)

>>> print(results)

{'Mean': 4.5, 'Count': 10.0, 'Kurtosis': -1.2000000000000002, 'Skew.P': 0.0, ... }

>>> keys = ['Mean', 'StdDev.P']

>>> results = Stats.DescriptiveStatistics(data, keys)

>>> print(results)

{'Mean': 4.5, 'StdDev.P': 2.8722813232690143}

In order to do this, we need two separate overloaded functions. This is the same approach that we used in the C++/CLI wrapper in Chapter 3. In this case, however, we do not need to explicitly write the overloads. We make use of the macro BOOST_PYTHON_FUNCTION_OVERLOADS to generate the overloads for us. The arguments are the generator name, the function we want to overload, the minimum number of parameters (1 in this case), and the maximum number of parameters (2 in this case). Having defined this, we then pass the f_overloads structure , along with the docstring, to the def function.

The second part of the module definition, shown in Listing 8-6b, declares the classes that can be used directly in Python.

Listing 8-6b

The classes: StatsPythonBoost module definition

Listing 8-6b shows the TTest and DataManager classes that we wrap in this module. With these classes defined, we can write the following from a Python script, for example:

# Perform t-test from summary data

t: Stats.TTest = Stats.TTest(5, 9.261460, 0.2278881e-01, 195)

t.Perform()

print(t.Results())

The C++ wrapper class for the t-test is defined in StatisticalTests.h. The class template argument references our wrapper class. In this case, we have named it StudentTTest to distinguish it from the underlying Stats::TTest class. This class holds an instance of the underlying Stats::TTest class. The constructors determine the type of t-test to be performed and convert between boost::python types and the underlying C++ types, using the same conversions that we have seen.

From the module definition in Listing 8-6b, we can see that the first parameter is the name of the class, "TTest". This is the name for the type we will call from Python. Alongside this, we define an init function (the constructor) which takes four arguments. We then define two additional init functions, one each for the remaining constructors with their corresponding arguments. Finally, we define the two functions Perform and Results. All the functions provide a docstring. That is all we need to do to expose a native C++ type to Python.

The DataManager class is exposed in a similar way. The C++ wrapper class is defined in DataManager.h in the namespace API::Data. This allows us to keep the wrapper class separate from the StatsLib C++ class of the same name. As before, the purpose of the wrapper class is to handle the type conversions and manage the lifetime of the underlying DataManager class in the StatsLib. Listing 8-7 shows a typical example function.

Listing 8-7

The DataManager::ListDataSets function

From Listing 8-7 we can see that the function ListDataSets returns a Python list using the Boost.Python type. The list comprises Stats::DataSetInfo items that are typed as

using DataSetInfo = std::pair<std::string, std::size_t>;

The items contain the dataset name and the number of observations in the data. The function first obtains the currently loaded datasets from the m_manager member that this class wraps. Inside the for-loop, we use the function boost::python::make_tuple to create a Python tuple element with the dataset information. This is then appended to the results list and returned to the caller. The remaining functions are similarly straightforward.

Exception Handling

As in the previous chapter, exceptions should be handled and processed from the wrapper functions. In particular, we are concerned with bad arguments, so we should check types and report exceptions appropriately. We could use the same approach that we used in the previous chapter (manually translating C++ exceptions to Python exceptions). However, we can also take advantage of Boost.Python. In the module definition, the Boost.Python framework wraps our functions in the call to .def(...) so they are not called directly via Python. Instead, Python calls function_call(...) (oost_1_76_0libspythonsrcobjectfunction.cpp). This function wraps the actual function call in an exception handler. The exception handler handles the exception in the way that we did previously (oost_1_76_0libspythonsrcerrors.cpp), though it catches and translates more exception types. This means Python does not halt and the exception is handled gracefully. We can test this out using the following Python code which passes in a string inside a list instead of the expected numeric item:

try:

x = [1, 3, 5, 'f', 7]

summary: dict = Stats.DescriptiveStatistics(x)

print(summary)

except Exception as inst:

report_exception(inst)

The error that is reported is

No registered converter was able to produce a C++ rvalue of type double from this Python object of type str

This error is provided by Boost. On the other hand, if we pass in an empty dataset, we get the following:

try:

x = []

summary: dict = Stats.DescriptiveStatistics(x)

print(summary)

except Exception as inst:

report_exception(inst)

The error that is reported is

<class 'ValueError'> The data is empty.

This is the error that is thrown from the underlying StatsLib. Basically, the same error handling that we wrote in the previous chapter is now provided for free.

PyBind

In this section, we develop our third and final Python extension module. This time we use PyBind. Boost.Python has been around for a long time and the Boost library that it is a part of offers a wide range of functionality. This makes it a relatively heavyweight solution if all we want to do is create Python extension modules. PyBind is a lighter-weight alternative. It is a header-only library that provides an extensive range of functions to facilitate writing C++ extension modules for Python. PyBind is available from here: https://github.com/pybind/pybind11.