© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
A. GladstoneC++ Software Interoperability for Windows Programmershttps://doi.org/10.1007/978-1-4842-7966-3_5

5. Building an R Package

Adam Gladstone1  
(1)
Madrid, Spain
 

Introduction

In this chapter and the next, we connect our simple C++ library of statistical functions to R. We do this by creating an R package using Rcpp. We then use this wrapper component to expose the functionality we want. This chapter focuses on both the project setup and the mechanics of building packages with RStudio. The following chapter focuses on the details of using Rcpp as a framework for connecting C++ and R.

The project setup in this case is slightly more involved than previously. In general terms, we require the standard environment for building a CRAN1 package. More specifically, the development environment needs to use a suitable compiler. Because the outputs produced by different C++ compilers (GCC, MSVC, and so on) are not all the same, it is not possible to mix the generated object code from different compilers. The result, from our narrow practitioner perspective, is that we need to build a version of the C++ library of statistical functions with a different compiler/linker. Specifically, the GNU Compiler Collection (GCC) has to be used along with the corresponding g++ compiler (gcc) for the C++ language. This is in order to build an ABI (Application Binary Interface)-compatible component that will be hosted in R and which interoperates in the R environment.

A brief outline of the steps involved is as follows:
  1. 1.

    Install the required gcc tools.

     
  2. 2.

    Setup and build a new static library (using CodeBlocks) from the same sources as before. The library output is libStatsLib.a.

     
  3. 3.

    Create the Rcpp project in RStudio (StatsR).

     
  4. 4.

    Configure the Rcpp project to use the new static library.

     

This will give us a working wrapper shell. Later on, we look at how the functionality is added. Unlike in previous chapters, we focus more on the toolchain (CodeBlocks, Rtools, and RStudio) in this chapter. We leave writing the Rcpp layer, and building and testing the functionality until the next chapter.

Prerequisites

Rtools

Rtools is a suite of tools for building R packages on Windows and includes the gcc compiler. The installer for Rtools is available from https://cran.r-project.org/bin/windows/Rtools/. You should install the 64-bit version of Rtools: rtools40-x86_64.exe. It is important to note that to install Rtools 4.0 you will need version 4.0.0 of R or above. After completing the installation, ensure that the RTOOLS40_HOME environment variable is set to the rtools directory. Also, add the rtools directory to the PATH environment variable. It is also possible to install Rtools directly from inside RStudio, using the command: install.Rtools(). This installs the latest version of Rtools. The following link gives instructions on how to do this: https://rdrr.io/cran/installr/man/install.Rtools.html. To check that Rtools has been installed correctly, open a PowerShell prompt and type gcc --version to display the program version information.

Installing CodeBlocks

In reality, installing CodeBlocks is not a prerequisite, it is a convenience. The installer is available from www.codeblocks.org/downloads/binaries. Our goal is to build an ABI-compliant static library with the gcc toolchain, and there are several ways to achieve this. If you are comfortable building libraries manually using makefiles or you prefer to use CMake to configure a build environment using the gcc toolchain , you do not need to use CodeBlocks. Appendix B contains basic instructions on configuring a Visual Studio CMake project StatsLibCM to build the library output we require. On the other hand, if you prefer Visual Studio Code as your C++ development environment, this can also be configured to work with GCC using MinGW. For further information, see the Additional resources section at the end of the chapter.

Because CodeBlocks is already configured for cross-platform C++ development using gcc, we will continue to use it here. In addition, CodeBlocks provides a wide variety of useful project types and several build targets (static-link library, dynamic-link library, for example). Moreover, the debugging support (including setting breakpoints and watch variables) is easier than using gdb from a console session.

CodeBlocks

Toolchain Setup

Open up CodeBlocks. Go to Settings ➤ Compiler ... . Under the Selected compiler, select GNU GCC Compiler and configure the version g++ targeting C++17. Figure 5-1 shows the Global compiler settings.
Figure 5-1

Compiler settings in CodeBlocks

In addition to the General settings (shown in Figure 5-1), there are a number of useful options for controlling aspects of the compilation. Specifically, there are options for debugging, profiling, warnings, optimization, and CPU architecture. For this project we do not make use of any of these options, but it is useful to know they exist.

Next, in the Toolchain executables tab, click the Auto Detect button. This should fill in the path to the compiler’s installation directory, for example, D:Program Filesmingw-w64x86_64-8.1.0-posix-seh-rt_v6-rev0mingw64. If this isn’t the case, click the “...” button, and manually select the MinGW directory (under which the gcc tools are located). Note that CodeBlocks itself installs the MinGW toolset. So, in addition to Rtools, you may have a second installation of MinGW. I have two versions of the MinGW packages – one with gcc 8.1.0 from CodeBlocks and one with gcc 8.3.0 from Rtools. This does not cause a problem since the outputs from both are ABI compatible. The MinGW installation from CodeBlocks puts the directory into the PATH environment variable, so this is what we use to build with. You can, however, change this to use the path to Rtools if you prefer.

Fill in the rest of the tools as shown in Figure 5-2.
Figure 5-2

Toolchain executables

Finally, under the Search Directories tab, add the path to the boost_1_76_0 directory. Figure 5-3 shows the setting we use.
Figure 5-3

Setting the search path to use Boost

When you have finished configuring this, press OK to save any changes.

Project Setup

The StatsLibCB directory contains the CodeBlocks project file (StatsLibCB.cbp). The project uses the Static Library template. The static library is based on the same C++ sources as previously, located in the Common directory. Open the project in CodeBlocks. Right-click on the project node and select Project properties as shown in Figure 5-4.
Figure 5-4

Project settings

The overall project settings are straightforward. This page (Figure 5-4) gives options relating to the object file generation, pch file, platforms, and execution directory. We have not made any changes here. Select the Build targets page and check that the (Debug and Release) build targets are shown as in Figure 5-5.
Figure 5-5

Build targets for libStatsLib

Looking at Figure 5-5, we can see that the type of the project is set to Static library . The Output filename is libStatsLib.a (both Debug and Release). At the bottom of Figure 5-5, we can see the Build target files that we have added. Click OK to save the settings. The project environment should look like Figure 5-6.
Figure 5-6

The project node with the source and include files

Depending on how the StatsLibCB workspace node is being displayed (right-click on the node for various options), your view of the project files may be slightly different. At this stage, the project is ready to build. From the Build menu, select Build (Ctrl+F9). Build both the debug and release versions of the library. The Build log tool window displays the command line passed to the compiler and linker. The project should build without warnings or errors, and the library (libStatsLib.a) should be located in the output directory corresponding to the selected build target.

R/RStudio Packages

Background

For this section, you will need to have RStudio up and running. RStudio is the IDE of choice for hosting the R environment and developing applications using the R language. We could have used the more basic RGui; however, RStudio provides better facilities, specifically when it comes to building R packages. So, having built the ABI-compliant statistics library successfully, we are now ready to create an R package that uses it.

On Windows, R packages are dynamic-link libraries. They can be loaded dynamically using the dyn.load() function using the full filename (including the dll extension) or, more typically, using the library() command for installed (registered) packages. Among other things, RStudio provides a convenient IDE for managing the installation and loading of packages.

To communicate with packages, the R language and environment provides a low-level C-style API (application programming interface). Once a package has been loaded, users can call functions in the package, pass parameters, and get results back. What this means is that once we have built the StatsR.dll as a package, we could load it and execute the following command, for example:
> .Call("_StatsR_get_descriptive_statistics", c(0,1,2,3,4,5,6,7,8,9), c("StdErr"))
   StdErr 0.9574271

This calls the get_descriptive_statistics function with two parameters passed as collections: the data and a single key, “StdErr”. The results are returned as expected. The actual function name we use in making the call _StatsR_get_descriptive_statistics is the C-style exported function name. We could get this from inspecting the StatsR.dll using a tool like Depends.exe.

However, this API is quite low-level and not ideal for extended development. Our intention here is to expose a (limited) number of functions from the underlying C++ statistics library. Using the C-style API approach, we would need to declare all the functions to be of type extern "C" SEXP. This is a pointer to a SEXPREC or Simple EXPression RECord, an opaque pointer type used by R. Furthermore, the parameters would have to be typed as pointers to S EXPression objects (SEXP). Using the C-style API does allow us to exchange data and objects between C++ and R, but it is not a practical proposition for more complex C++ development.

The Rcpp framework solves this issue. The Rcpp layer sits above the .Call() API and shields the C++ developer from needing to use the low-level SEXP types. Rcpp provides an interface that automatically translates standard C++ into calls to the low-level API. From the point of view of development, Rcpp allows us to use standard C++ for the wrapper component.

Building a Package with Rcpp

Installing Rcpp

The Rcpp package can be installed by running the R command: install.packages("Rcpp"). Alternatively, from the RStudio menu, we can use the Tools ➤ Install Packages... command. Once completed, we are ready to build an Rcpp package. From inside RStudio, open the StatsR project: File ➤ Open Project ... . The StatsR.Rproj file is located in the StatsR directory under the SoftwareInteroperability directory.

The Project Files

The RStudio IDE provides the facility to create an Rcpp project directly. StatsR was created using File ➤ New Project, and in the New Project Wizard, selecting New Directory, then “R Package using Rcpp” and a directory name. With this, the boilerplate files are generated. We could have generated the required package files from scratch or we could have used the command Rcpp.package.skeleton to generate the project files. In our case, the Rcpp project template generates the files in several subdirectories under the StatsR project directory. The files are listed as follows with a brief description of each:
  • StatsR.proj

    This is the RStudio project file.

  • DESCRIPTION

    This file contains descriptive information about this package (Package name, Type, Version, Date, Author, and so on). It also contains metadata about the package dependencies. See the Additional Resources section for links to more detailed information about package metadata.

  • NAMESPACE

    This file contains three directives. Firstly, useDynLib(...) ensures that the dynamic library that is part of this package is loaded and registered. Next, the importFrom(...) directive imports variables from other packages (other than baseR, which are always imported). In this case, we import variables from the Rcpp and the evalCpp packages. The final directive, exportPattern(...), declares which identifiers should be globally visible from the namespace of this package. The default is to export all identifiers that start with a letter. This is defined in the regular expression.

  • manStatsR-package.Rd

    This is an R markdown template file that is used for describing the package. You can edit this in RStudio. Pressing the Preview button displays the formatted contents in the Help window.

  • RRcppExports.R

    This file contains the R language function calls generated by Rcpp.

  • srcRcppExports.cpp

    This file contains the C++ functions generated by Rcpp.

  • srcMakevars.win

    This file contains the configuration options for the compiler/linker.

  • srcStatsR.cpp

    This is the main file we will be working with in this chapter and contains boilerplate code.

Editing the Makefile

In terms of packaging, up to now we have been working inwards from both sides, as it were. On one side, we have rebuilt our statistical functions library as libStatsLib.a. On the other side, we have created a StatsR project using Rcpp. Now, we need to link the C++ statistical functions library into the Rcpp project. To do this, we need to update Makevars.win . This file can be found in the src directory. Makevars.win is the Windows makefile for this project. It overrides the default build configuration file Makeconf. For reference, this file can be found by running the command file.path(R.home("etc"), "Makeconf "). It contains all the settings for compiling and linking using the gcc toolchain , so should be treated with some caution. For this project, the configuration is much simpler. We only use a single flag:
  • PKG_LIBS: This flag is used to link in additional libraries (such as libStatsLib.a).

Two other flags of interest, depending on the build target, are
  • PKG_CXXFLAGS: This flag can be used to set additional debug or release options. For debugging, we build with debug information for gdb (-ggdb), the zero-optimization level (-o0), and the warning level (-Wall). For release builds, we remove these settings.

  • PKG_CPPFLAGS: These relate to preprocessor flags and can be used to set additional include directories with -I.

The Additional Resources section provides links to more detailed descriptions of the flags and their usage. Returning to Makevars.win , we have added the following lines at the bottom of the makefile:
## Directory where the static library is output
PKG_LIBS=<your path>/SoftwareInteroperability/StatsLibCB/bin/Release/libStatsLib.a

This will tell the linker to link with the release version of the libStatsLib library. Save your changes.

Boilerplate Code

Still in the src directory, open the file StatsR.cpp. There is some useful generated boilerplate code here that we will use to check the build process. Listing 5-1 shows the code.
-
Listing 5-1

Boilerplate C++ function in the StatsR package

In this file, we define a single C++ function called library_version that returns a hard-coded string. There are a couple of features that are worth highlighting in this small example.

Firstly, at the top of the file, we include Rcpp.h. This is the main Rcpp header. You can find this under libraryRcpp in your R distribution (e.g., D:RR-4.0.3library) alongside the rest of the source code. Rcpp is quite an extensive package (some 300+ files) and has a lot of facilities that are well worth exploring. The documentation directory (Rcppdoc) contains a number of useful bitesize reference documents that are worth referring to. We barely scratch the surface in the two chapters on R in this book.

Secondly, of note is the attribute
// [[Rcpp::export]].

This indicates that we want to make this C++ function available to R. The function itself is quite simple. It constructs a String object and returns it to the caller.

The RStudio IDE is good for writing and developing R scripts. However, for C++ development it is less useful, especially when it comes to being able to read through source code or go to definitions of types (like String in the preceding example). While not absolutely critical, it is nice to be able to right-click on a symbol and jump to the definition (if possible). This also makes both navigating around the source code and investigating any compilation errors related to type conversions slightly easier.

With this in mind, a quick and non-intrusive workaround to achieve this using Visual Studio Code is the following. Open the StatsR directory in Visual Studio Code (File ➤ Open Folder …), then open the StatsR.cpp file. For this to work, you will need to have installed the VSCode C++ plugin (“C/C++ for Visual Studio Code”). Edit the plugin configuration file (<your path>SoftwareInteroperabilityStatsR.vscodec_cpp_properties.json) to look for the sources in the Rcpp location and the root include directory. Add the "configurations" section in Listing 5-2 to the c_cpp_properties.json properties file.
-
Listing 5-2

Adding include paths to the c_cpp_properties.json file in VSCode

With this in place, you can right-click on symbols (or press F12) and jump to the definition as shown in Figure 5-7.
Figure 5-7

Using Visual Studio Code to navigate the Rcpp source files

Looking at Figure 5-7, it turns out that the String class encapsulates a CHARSXP – an S-expression pointer of type char (roughly speaking).

Building StatsR

Returning to the function library_version: we will use this simple function to test the build end to end. We should be able to call this function from the minimal R script in Listing 5-3.
library(StatsR)                 # Load the library
res = StatsR::library_version() # Retrieve the library version
res                             # Display it
Listing 5-3

A simple test R script

Click Build ➤ Clean and Rebuild (or from the Build menu in the Build pane). It sometimes happens that the current R session is active, for example, if you have reloaded the environment when opening a project. This will result in the Clean and Rebuild displaying a message that the library is in use, similar to the following:
ERROR: cannot remove earlier installation, is it in use?
* removing 'D:/R/R-4.0.3/library/StatsR'
* restoring previous 'D:/R/R-4.0.3/library/StatsR'
...
Exited with status 1.
If this happens, just select Session ➤ Restart R from the main menu, and then proceed as before. The output should look like Listing 5-4.
-
Listing 5-4

Clean and Rebuild output

Listing 5-4 shows in detail the steps taken in the build process. As might be expected, the “Clean and Rebuild” process is somewhat involved. The first stage is the call to Rcpp::compileAttributes(). This inspects the C++ functions in the src directory and looks for attributes of the form // [[Rcpp::export]]. When it finds one, it generates both the C++ and the R code that is required to expose the function to R. These function wrappers are generated in src/RcppExports.cpp and R/RcppExports.R (note the different file extensions and locations). More specifically, Rcpp uses the export attribute to generate a function wrapper which maps an R function library_version to a C-style function call. This is the R call (found in RcppExports.R). Listing 5-5 shows the R function.
-
Listing 5-5

The R function stub generated from the library_version C++ code

You can see that this uses the low-level .Call() interface that we described earlier. The corresponding C++ function is also generated in RcppExports.cpp. This is shown in Listing 5-6.
-
Listing 5-6

Low-level C++ code generated by Rcpp

The first line (after the comment) is the function signature of the C++ function. This is followed by the C-style API declaration. Inside the function, Rcpp has generated code to call the function and return the results. We will have more to say about the Rcpp code generated here in the following chapter.

In addition to the generated C++ function wrappers, RcppExports.cpp also contains the module definition. This is a mapping from the function name to an exported function address. It also contains information about the number of parameters. You should never need to use these files directly. Both files (src/RcppExports.cpp and R/RcppExports.R) are flagged as read-only. Modifying these files by hand is not recommended.

To summarize what is happening so far: we have written a C++ function library_version (in fact this was boilerplate code, but the process is the same); Rcpp has generated an R function and the low-level wrapper code that translates Rcpp types to low-level types understood by the R .Call() API.

After the file generation, the build process then builds a DLL and makes it available to R. It does this by installing the package in the package location. You can confirm this by looking in your R distribution under library. In our case, it is under D:RR-4.0.3libraryStatsR. Finally, the build process generates some documentation. You can configure the build process to use roxygen2, if you require. In this case, we stick with the default R markdown documentation. This is used to generate an html version of the documentation in the package location (D:/R/R-4.0.3/library/StatsR/html/StatsR-package.html).

If everything has gone to plan, we should now find a StatsR.dll in the project directory under src. And it should be loaded into the RStudio environment. You can confirm this by executing the command in Listing 5-7.
-
Listing 5-7

Obtaining a list of the loaded DLLs

The StatsR package appears at the bottom of the list of loaded dlls in Listing 5-7. Your output will look different depending on what is currently loaded. We can check that the version function works as expected by executing the following command:
> library_version()

The output should be: [1] "StatsR, version 1.0".

In addition, we can inspect the functions that are available in the package, as follows:
> library(pkgload)
> StatsFunctions = names(pkg_env("StatsR"))
> as.data.frame(StatsFunctions)
              StatsFunctions
1          t_test_two_sample
2 get_descriptive_statistics
3          t_test_one_sample
4        t_test_summary_data
5            library_version
6          linear_regression

With this completed, we have a fully working Rcpp package which provides a wrapper around our C++ library of statistical functions.

Summary

We’ve covered quite a lot of ground in this chapter. We have (re)built the library of statistics functions using the gcc compiler/linker . We have also built the wrapper component, StatsR.dll. This is convenient, as it allows us to reuse the sources without change, while at the same time separating the wrapper component (StatsR.dll) from the underlying C++ code.

This chapter has focused on setting up the infrastructure required to build R packages that consume C++ functionality. It should be emphasized that this arrangement is only one of a number of possible ways of organizing the R package development and build process. With CodeBlocks open as our C++ development IDE, we can now develop C++ code, which we can compile and build into a static library (libStatsLib.a), for example. Then, in RStudio, we can use our Rcpp project (StatsR) to expose the C++ functions. We can build this into an R package and make the functionality available immediately in an R session. We now have the infrastructure for end-to-end C++ and R development. With this infrastructure in place, we are now in a position to look at using Rcpp. In the next chapter, we look in more detail at the Rcpp framework we use in the wrapper component and how the statistical functions are exposed to R via Rcpp.

Additional Resources

The following links provide some more information on the topics covered in this chapter:

Exercises

The exercises that follow mainly deal with the effects of adding code to the C++ codebase and building these changes in a library that we can use to then build an R package. The exercises are concerned with setting up the infrastructure for usage in R/RStudio.

1) Rebuild libStatsLib.a in preparation for use inside R/RStudio. The intention here is to recompile the code in the static library and make sure that we can link this to the StatsR project.

The steps to follow are
  • Open the StatsLibCB project in CodeBlocks. The TimeSeries class is already incorporated as part of the project, so there is no need to do anything. Expand the Sources node and confirm that TimeSeries.cpp is present. Do the same for the header file. If you have added a ZTest class to the StatisticalTests.h/StatisticalTests.cpp, then they can be built immediately.

    On the other hand, if you have added the ZTest class in separate files, then you will need to add the files to the StatsLibCB project. To do this, select Project ➤ Properties, Build targets tab, and add them.

  • Build both debug/release versions. They should build without warnings or errors. Check that the files are being compiled/linked.

  • Open RStudio. Select Build ➤ Clean and Rebuild and check that the build (still) works without warnings or errors. Confirm that the StatsR package loads and works.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.213.87