© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
A. GladstoneC++ Software Interoperability for Windows Programmershttps://doi.org/10.1007/978-1-4842-7966-3_6

6. Exposing Functions Using Rcpp

Adam Gladstone1  
(1)
Madrid, Spain
 

Introduction

In the previous chapter, we built an R package using Rcpp. Moreover, using CodeBlocks, we established the infrastructure for developing and building our ABI-compliant library of statistical functions (libStatsLib.a), which we linked into our R package (StatsR.dll). For the moment, we have only used a single function, library_version (defined in StatsR.cpp). We used this to illustrate the build process and to test the communication between R and C++.

In this chapter, we look in detail at how to expose the functionality of the underlying statistical library. We first look at the descriptive statistics and linear regression functions. Then we examine RcppModules in the context of the statistical test classes. The final part of this chapter looks at using the component with other R packages. We cover testing, measuring performance, and debugging. The chapter ends with a small Shiny app demonstration.

The Conversion Layer

In the C++/CLI wrapper (Chapter 3), we spent some time developing an explicit conversion layer, where we put the functions to translate between the managed world and native C++ types. The approach taken by Rcpp means that we no longer need to do this. We make use of types defined in the Rcpp C++ namespace in addition to standard C++ types, and we let Rcpp generate the underlying code that allows communication between R and C++. This interface ensures that our underlying statistical library is kept separate and independent of Rcpp.

As pointed out in the previous chapter, the Rcpp namespace is quite extensive. It contains numerous functions and objects that shield us from the basic underlying C interface provided by R. We only use a small part of the functionality, concentrating particularly on Rcpp::NumericVector, Rcpp::CharacterVector, Rcpp::List, and Rcpp::DataFrame.

Code Organization

The C++ code in the StatsR project is organized under the project’s src directory. This is where we locate the project’s C++ compilable units. Under this directory, we have already seen the following:
  • StatsR.cpp: Contains a boilerplate Rcpp C++ function

  • RcppExports.cpp: Contains the generated C++ functions

  • Makevars.win : Contains the Windows build configuration settings

In addition to the preceding list, we have the following three files, one for each of the functional areas we want to expose:
  • DescriptiveStatistics.cpp

  • LinearRegression.cpp

  • StatisticalTests.cpp

This is a convenient way to organize the functionality, and we will deal with each of these in turn.

Descriptive Statistics

The Code

Listing 6-1 shows the code for the C++ wrapper function get_descriptive_statistics .
-
Listing 6-1

C++ code for the DescriptiveStatistics wrapper function

Looking at Listing 6-1, there are a number of points that are worth highlighting:
  • The include files: Here, we #include the main Rcpp header followed by the Standard Library includes. This is followed by the #include of "Stats.h".

  • The comment block: Here, we document the function parameters with their name and type. We also use the @export symbol to make the R wrapper function available to other R functions outside this package by adding it to the NAMESPACE. Don’t confuse this with the Rcpp::export attribute that follows.

  • Attributes: We mark the function [[Rcpp::export]]. This indicates that we want to make this C++ function available to R. We have already seen an example of this with the library_version function in the previous chapter.

  • The wrapper function: Finally, the code itself – the R function is called get_descriptive_statistics. The first parameter is a NumericVector. The second parameter is an optional CharacterVector. If no argument is supplied, this is defaulted. The default argument is specified using the static create function. This allows us to retain the same calling semantics as the native C++ function. That is, we can call it with either one or two parameters. The get_descriptive_statistics function returns a std::unordered_map<std::string, double>, as does the underlying C++ function.

The code inside the get_descriptive_statistics function in Listing 6-1 is straightforward. We use the Rcpp function as<T>(...) to convert the incoming argument NumericVector vec (typedef’d as Vector<REALSXP>) from an SEXP (pointer to an S expression object) to a std::vector<double>. Similarly, we use Rcpp::as<T> to convert the CharacterVector keys to a vector of strings. We pass the parameters to the underlying C++ library function GetDescriptiveStatistics and retrieve the results. The results are then passed back to R using the native STL type. Under the hood, the results are wrapped as we describe in the following.

It should be clear from the preceding description that Rcpp allows us to write C++ code without being at all intrusive. Moreover, Rcpp facilitates the development process. Let’s take a concrete example. If we wished to add functions to expose the underlying individual statistics, ExcessKurtosis, for example, this is a straightforward change. We need to include the descriptive statistics header file:
#include "../../Common/include/DescriptiveStatistics.h"
Next, we create a new function and mark it for export, as the following code shows:
//' Compute excess kurtosis
//'
//' @param data A vector of doubles.
//' @export
// [[Rcpp::export]]
double excess_kurtosis(Rcpp::NumericVector data) {
  std::vector<double> _data = Rcpp::as<std::vector<double> >(data);
  double result = Stats::DescriptiveStatistics::ExcessKurtosis(_data);
  return result;
}
This function takes a NumericVector and returns a double. The function uses Rcpp::as<T> to convert the NumericVector to a std::vector<double> and then calls the underlying library and returns the result. You might like to try adding this, rebuilding the package, and testing out the function interactively, as follows:
> StatsR::excess_kurtosis(c(0,1,2,3,4,5,6,7,8,9))
[1] -1.224242
As we have seen, when we invoke “Clean and Rebuild”, the Rcpp framework updates the generated srcRcppExports.cpp file. It is instructive to look at the actual exported function generated in the file (but not to edit it). This is shown in Listing 6-2.
-
Listing 6-2

Rcpp generated code for the get_descriptive_statistics function

Looking at the generated code in Listing 6-2, we can see how similar this is to the C++ function we have written. The function name is synthesized from the package name and the C++ name; hence, it is called “_StatsR_get_descriptive_statistics”. This is declared with the RcppExport macro. This declares the function as extern "C". Apart from this, the main differences between the wrapper C++ code we have written and the Rcpp generated C++ code center on the types used under the hood. Without getting bogged down in the details, Rcpp uses SEXP (S expression pointers) for incoming types. And it uses an RObject type for outgoing types. These are basically pointer types. Rcpp::wrap creates a new pointer with a copy of the returned object using one of the forms of wrap_dispatch, for example:
template <typename T> inline SEXP wrap_dispatch(const T& object, ::Rcpp::traits::wrap_type_module_object_tag) {
    return Rcpp::internal::make_new_object<T>(new T(object));
}

At the same time, it converts the type to an RObject and assigns the RObject pointer to rcpp_result_gen, which is then returned to R. The copy of the std::unordered_map that is returned from GetDescriptiveStatistics is destroyed, while the RObject contains a copy. It should be clear from this description that, at a slightly higher level, Rcpp::wrap provides RAII (Resource Acquisition is Initialization) around the (pointers to the) objects returned from our native C++ code. That is Rcpp::wrap provides lifetime management which simplifies the C++ wrapper code considerably.

You might be wondering how this is actually presented in an R session. From the R point of view, std::unordered_map<std::string, double> is returned as a numeric class, as the script in Listing 6-3 shows.
# StatsR
stats <- StatsR::get_descriptive_statistics(data)
> class(stats)
[1] "numeric"
Listing 6-3

Retrieving the R class from a C++ wrapper function

The numeric vector we return makes use of the Named class. The Named class is a helper class used for setting the key side of key/value pairs. The result is that calling get_descriptive_statistics returns a numeric vector with labels, as shown in Listing 6-4.
> stats <- StatsR::get_descriptive_statistics(c(0,1,2,3,4,5,6,7,8,9))
> stats
Variance.S        Sum     StdErr   StdDev.P     Skew.P       Mean      Count     Skew.S    Maximum Variance.P   StdDev.S
 9.1666667 45.0000000  0.9574271  2.8722813  0.0000000  4.5000000 10.0000000  0.0000000  9.0000000  8.2500000  3.0276504
     Range   Kurtosis    Minimum     Median
 9.0000000 -1.2000000  0.0000000  4.5000000
Listing 6-4

Labelled output from the get_descriptive_statistics function

We can transpose the output such that the named columns become rows, simply by coercing the returned NumericVector into a data frame as follows:
> stats <- as.data.frame(stats)
> stats
                stats
Variance.S  9.1666667
Sum        45.0000000
...

Exception Handling

Returning to the code generated in RcppExports.cpp, there is one detail that we skipped over: the macros BEGIN_RCPP/END_RCPP. These macros define try{...}catch{...} blocks to handle exceptions that might be thrown by the C++ code. The exception handling logic is quite involved. If you are interested, the macros are defined in RcppincludeRcppmacrosmacros.h. If the underlying C++ function throws a std::exception, it will be caught and translated appropriately. Listing 6-5 shows an example.
# StatsR
> stats <- StatsR::get_descriptive_statistics(c(1,2))
Error in StatsR::get_descriptive_statistics(c(1, 2)) : Insufficient data to perform the operation.
Listing 6-5

An example of exception handling

From Listing 6-5, we can see that if we pass in too few data points to the underlying GetDescriptiveStatistics function, the exception is reported in an informative way. Summarizing what we have seen so far, it is clear that the Rcpp framework allows us to write clean C++ code while taking care of numerous details relating to translating between R and C++.

Exercising the Functionality

After doing a Clean and Rebuild, we can exercise the get_descriptive_statistics function and compare the results with the equivalent Base R functions. The script DescriptiveStatistics.R illustrates one way to do this. First, we load some additional packages: tidyverse and formattable, among others. The script then generates 1000 normally distributed random samples. Following this, we create two sets of data, one from StatsR and one using the equivalent Base R functions. We create a column to compare the results and add the three columns (StatsR, BaseR, and Results) to a data frame. The data frame is then formatted into a table. The row coloring changes depending on the TRUE/FALSE value in the results column, allowing us to easily detect differences in the results. These are shown in Figure 6-1.
Figure 6-1

Comparison of statistics: StatsR vs. BaseR

From the table in Figure 6-1, we can immediately see that there are no numeric differences in the values produced by both libraries.

Linear Regression

The Code

The C++ code for exposing our simple univariate linear regression follows the same pattern as the descriptive statistics. This is shown in Listing 6-6.
-
Listing 6-6

Wrapper function for LinearRegression

After the #includes, the function itself is declared as taking two NumericVectors and returning the results as before using std::unordered_map<std::string, double>. And as before, we use Rcpp::as<T> to copy the incoming vector to an STL type and rely on the implicit wrap to convert the results into a package of name value pairs. As discussed in the previous section, we leave the exception handling to the code generated by the Rcpp framework .

Exercising the Functionality

We’d like to test-drive this wrapper function, for example, by modelling some house price data and predicting a new price. The script LinearRegression.R shown in Listing 6-7 demonstrates one way to do this.
-
Listing 6-7

A simple linear model for house price prediction

The script in Listing 6-7 begins by loading the StatsR library and ggplot2. We define a simple predict function that will use the results of the linear regression. Next, we load the data. This is the same data that we used in Chapter 4 (in DataModelling.cs). Next, we plot the data and add a regression line. This is shown in Figure 6-2.
Figure 6-2

Scatterplot of house price against size

We call the wrapper function StatsR::linear_regression to obtain the model results and use the coefficients to predict a new value. Finally, we compare the results with the equivalent (but much more powerful) lm function in R. We can see that both the intercept (b0) and the slope (b1) are identical.

Using a DataFrame

From an R user’s perspective, the linear_regression function might be improved by being able to call it with a DataFrame. We can rewrite the linear_regression function to do this as shown in Listing 6-8.
-
Listing 6-8

Passing a DataFrame to the linear_regression function

We can see from Listing 6-8 that the only difference between this function and the previous one is that we pass in a single parameter, an Rcpp::DataFrame. We assume there are columns labelled "x" and "y". If the required column names do not exist, an error is generated:

("Error in StatsR::linear_regression(data) : Index out of bounds: [index="x"].").

We extract the columns as before into std::vector<double> types which we then pass to the C++ LinearRegression function. The results are returned as before. Calling the function now looks like this:
> data <- data.frame("x" = c(1.1, 1.9, 2.8, 3.4), "y" = c(1.2, 2.3, 3.0, 3.7))
> results <- StatsR::linear_regression(data)
> results
       b1        b0     SS_xy    x-mean     SS_xx    y-mean
1.0490196 0.1372549 3.2100000 2.3000000 3.0600000 2.5500000

The only caveat with this approach is that the compiler does not permit both linear_regression functions to exist. The error from the compiler is

"conflicting declaration of C function 'SEXPREC* _StatsR_linear_regression(SEXP)' ".

It appears not to be able to distinguish the one-parameter case from the two-parameter case. We can live with this by either insisting on a single function, or renaming one of the functions. The important point here is that in the wrapper layer, you can choose how to convert and present types to users.

Statistical Tests

Functions vs. Classes

The code for exposing the statistical tests functionality is located in StatisticalTests.cpp. We initially take the same approach to wrapping up the functionality as we have done previously in the StatsExcel component. That is, we wrap a C++ class in a procedural interface. Listing 6-9 shows part of the code.
-
Listing 6-9

Wrapper function to perform a t-test from summary data

The code in Listing 6-9 shows the function to perform a t-test from summary input data. The wrapper function takes four doubles as arguments (double mu0, double mean, double sd, double n) and returns the results as a package of key/value pairs. In the code, we need to construct a Stats::TTest object corresponding to the summary data t-test. We use the function arguments as parameters to the constructor. In the one-sample and two-sample cases, we pass in either one or two NumericVectors which are converted to a std::vector<double> as required. These are the same type of conversions that we have seen previously. After calling test.Perform, we obtain the results set. We could check explicitly if Perform returns true or false. However, if an exception is thrown, it will be handled by the Rcpp generated code.

Rcpp Modules

As we have seen, exposing existing C++ functions and classes to R through Rcpp is quite straightforward. The approach we have adopted until now is to write a wrapper function. This interface function is responsible for converting input objects to the appropriate types, calling the underlying C++ function, or constructing an instance if it is a class, and then converting the results back to a type suitable for R. We have seen a number of examples of both usages: exposing functions and classes with wrapper functions.

In certain circumstances however, it might be desirable to be able to expose classes directly to R. If the underlying C++ class has significant construction logic, for example. We would rather expose a class-like object that can be managed by R rather than incurring the cost of constructing an instance of the class on each function call, as we do with the t-test wrapper functions. More generally, exposing classes directly allows us to retain the underlying object semantics. The Rcpp framework provides a mechanism for exposing C++ classes via Rcpp modules. Rcpp modules also allow grouping of functions and classes in a single coherent modular unit.

To create a module, we use the RCPP_MODULE macro . Inside the macro, we declare the constructors, methods, and properties of the class we are exposing. Listing 6-10 shows how the TTest class can be exposed to R along with the declaration of the module.
-
Listing 6-10

Exposing the TTest class via the RCPP_MODULE macro

The code in Listing 6-10 is in srcStatisticalTests.cpp. There are two parts to this code. The first part declares a C++ TTest wrapper class. This class wraps a native Stats::TTest member. The C++ wrapper class is used to perform the required translations between types. The constructors for the summary data and one-sample t-tests take the same Rcpp arguments as in the procedural wrappers and perform the same conversions we have seen before. The two-sample t-test uses an Rcpp::List object containing two numeric vectors labelled “x1” and “x2”. The methods Perform and Results are simply forwarded to the underlying native Stats::TTest instance. The design pattern is similar to a pimpl (pointer-to-implementation ) idiom or a facade or adaptor pattern.

The second part of the code declares the RCPP_MODULE macro . We define the class name as “StatsTests”. This will be used by R to identify the module. Within the module, a class is exposed using the class_ keyword. The trailing underscore is required as we cannot use the C++ language keyword class. Here, class_<T> is templated by the C++ class or struct that is to be exposed to R, in this case, the name of our wrapper class. The string “TTest” that is passed into the class_<TTest> constructor is the name we will use when calling the class from R. Following this, we describe the class in terms of constructors, methods, and fields (not illustrated here). We can see that in this case, we have the three constructors corresponding to a summary data t-test, and both the one-sample and two-sample t-tests. The template arguments are the parameters of the respective underlying constructors. The use of Rcpp::List instead of two Rcpp::NumericVector parameters is a convenient way to package up the input arguments. It also provides a straightforward workaround to the issue that the RCPP_MODULE constructor method cannot distinguish between the following constructors:
.constructor<double, Rcpp::NumericVector>
.constructor<Rcpp::NumericVector, Rcpp::NumericVector>
Besides the constructors, in Listing 6-10, we can see that we have two methods. The method function takes the function name followed by the address of the wrapper function, followed by a help string. In general terms, we are providing a declarative description of the class to Rcpp. We also supply documentation strings. Listing 6-11 shows an example of how the TTest class can be used.
moduleStatsTests <- Module("StatsTests", PACKAGE="StatsR")
ttest0 <- new(moduleStatsTests$TTest, 5, 9.261460, 0.2278881e-01, 195)
if(ttest0$Perform()) {
  print(ttest0$Results())
} else {
  print("T-test from summary data failed.")
}
Listing 6-11

Using the TTest class in an R script

In Listing 6-11, we create a module object by calling the Module function with the name "StatsTests". Entities inside the module may be accessed via the $ symbol. Note that in our limited example, we have only placed a single entity inside the Rcpp module. However, there is no reason why this could not also contain other classes and related functionality. In R, we instantiate our TTest class as ttest0 using new with the object name followed by the parameters. We can then use the instance ttest0 to perform the test and print the results or an error message.

Overall, RcppModules provide a convenient way both to group functionality and to expose C++ classes. We therefore have the choice of writing wrapper functions or wrapper classes, whichever suits our purposes best. This has been a brief introduction to RcppModules. There are numerous details of this approach that we have not covered here.

Testing

Now that we have exposed the functionality of the underlying statistical library, it is useful to test that everything works as expected. For unit testing, we use the “testthat ” library (https://testthat.r-lib.org/). The tests are organized in the ests estthat directories under the main project. The testthat.R script under ests invokes the unit tests under estthat. There are three test files corresponding to the three areas of functionality:
  • test_descriptive_statistics.R

  • test_linear_regression.R

  • test_statistical_tests.R

The tests follow the same arrange-act-assert form that we have used on previous occasions. In the case of both the descriptive statistics and linear regression tests, we check the results against Base R functions. Listing 6-12 shows an example for linear regression test.
-
Listing 6-12

The LinearRegression test

The LinearRegression test in Listing 6-12 creates x and y values and places these in a data frame. We then call the R function lm followed by our LinearRegression function. Finally, we compare the intercept and slope coefficients.

For the statistical hypothesis tests, we choose to test against hardcoded expected values (Listing 6-13).
-
Listing 6-13

Testing the summary t-test from data

In Listing 6-13, we only test the wrapper function as it is slightly easier to call than the class.

All the tests can be run by opening the testthat.R script and clicking the Source button. This is shown in Figure 6-3.
Figure 6-3

Running the test harness

The output from the test run in Figure 6-3 indicates that all the tests (34 of them) passed. There were no failures, warnings, or tests that were skipped. It also outputs the test durations.

Measuring Performance

One of the reasons for using C++ for lower-level code is the potential for performance gains when compared to using just R. Therefore, it seems worthwhile to try to measure this. Listing 6-14 shows Benchmark.R
-
Listing 6-14

shows Benchmark.R

The benchmark script in Listing 6-14 compares the performance of the C++ linear_regression function with R’s lm function. The comparison is somewhat artificial. R’s lm function is far more flexible than our simple linear regression function. The comparison is for illustrative purposes only. The script loads a number of libraries, including the rbenchmark library. This is useful for micro-benchmarking functions. We use the well-known R dataset mtcars to perform a regression of mpg against weight. As usual, we plot the data beforehand and check the distributions using a density plot. We wrap the two functions that we are interested in comparing in dummy functions so that bench::mark does not complain that the result sets are different. Then we call bench::mark(...) with both functions. We output the result to the console.
                                    total_time
1 StatsR(mtcars$wt, mtcars$mpg)     178ms
2 R_LM(mtcars)                      491ms
The actual results are considerably more detailed than those shown earlier. However, we have summarized the total_time to illustrate the approach. We can see that the total_time taken by the StatsR function is 178ms compared to 491ms for the R_LM function. We also plot the output, shown in Figure 6-4.
Figure 6-4

Benchmark comparison of StatsR and R lm functions

The difference in the timings is not surprising since the lm function does much more than our limited LinearRegression function.

Debugging

RStudio supports debugging of R functions in the IDE. Simply set the breakpoint(s) at appropriate locations and Source the file. Then, we can step through the R code line by line inspecting variables interactively and so on. Unfortunately, debugging the C++ code in a package is more difficult and less informative. It is possible to do this using gdb . However, for this we need to use Rgui as the host environment rather than RStudio. A full treatment of debugging R is beyond the scope of this chapter. However, should you need to, the process for attaching to the Rgui process and breaking into the debugger is as follows:
  • Navigate to the directory with the sources (SoftwareInteroperabilityStatsRsrc).

  • Start gdb with the Rgui as a parameter as follows: gdb D:/R/R-4.0.3/bin/x64/Rgui.exe.

Figure 6-5 shows the commands.
Figure 6-5

A typical gdb session

Note that we have interleaved the gdb session with the Rgui session. After starting gdb, type in run. This will run Rgui. See Figure 6-5. Then in Rgui, run devtools::load_all(). This will rebuild the StatsR.dll if necessary and will install and load the package. Next, in Rgui, select Misc ➤ Break to Debugger to return to the gdb session. In gdb, set the breakpoints you want. For example, we can set a breakpoint on get_descriptive_statistics. Use the command:
break get_descriptive_statistics
Then press c to return control to Rgui and continue. In Rgui, execute
> get_descriptive_statistics(c(1,2,3,4,5,6,7,8), c("Mean"))

This will now break into the debugger at the call location. From here we can single step through the function call (command n). However, the information from the individual function calls is quite limited, which makes debugging less useful than it should be.

Distribution Explorer

As pointed out in Chapter 4, when developing wrapper components, we are concerned not only with whether or not the functions (and classes) work correctly but also with how the component as a whole interoperates. With this in mind, the StatsR project contains a small Shiny App called Distribution Explorer. This is based on an existing example from the Shiny gallery (https://shiny.rstudio.com/gallery/) and adapted to use StatsR functionality. The user interface is shown in Figure 6-6.
Figure 6-6

StatsR Shiny App

The Distribution Explorer generates a (configurable) number of random observations from the selected distribution in the left-hand panel. In the right-hand panel, it displays a histogram of the data and, more importantly from our point of view, produces summary statistics using the StatsR function get_descriptive statistics. Listing 6-15 shows the code.
-
Listing 6-15

Displaying summary statistics

The summary statistics stats are rendered to a summary panel declared in the UI fluidPage. Once the data has been generated, we extract it as a single column NumericVector . This is passed to get_descriptive_statistics in the usual way along with the keys representing the summary statistics we want returned. Presenting the results takes a few more lines of code. First, we coerce the results into a DataFrame and format the numeric values. Then we coerce the results into a table format and return them. As can be seen, our StatsR package works, more or less seamlessly, with other R packages.

Summary

In this chapter, we have written a fully functioning R package that connects to a native C++ library. We have exposed both functions and classes from the underlying library so that they are available for use in R/RStudio. We have tested the functionality and benchmarked it.

Once we have these pieces in place (an RStudio Rcpp project, Rtools available for compiling and building, and a C++ development environment), there is nothing to stop us using any of the analytics offered in public domain C++ libraries as part of an R data analysis toolchain. We might, for example, take QuantLib (www.quantlib.org/) and use some of the interest rate curve building functionality in R. Alternatively, we might consider developing our own C++ libraries, and making these available in R. It is worth emphasizing that this goes beyond the more traditional use-case of writing small amounts of C++ code that are compiled and run inline in R with a view to improving performance. These two chapters have provided a working infrastructure for more systematic development of C++ components with the intention of making the functionality available in an R package. Rcpp makes this process seamless and takes away much of the work involved. In the next two chapters, we look at a similar situation, but in this case, our focus is on the Python language and Python clients.

Additional Resources

The following links go into more depth on the topics covered in this chapter:

Exercises

The exercises in this section deal with incorporating the various changes we have made to the underlying codebase into the R package, and exposing the functionality via Rcpp. All the exercises use the StatsR RStudio project.

1) We extended the LinearRegression function to calculate the correlation coefficient r and r2 and added these to the results package. Confirm that the additional coefficients calculated in the LinearRegression function are displayed, and check the values.

For this, you can use the script LinearRegression.R. To check the results, use the functions cor(data) and cor(data)^2. Compare these values to the values obtained in the results package from the function StatsR::linear_regression(...). The results should be identical.

Extend the test case in test_linear_regression.R to include a check of these values.

2) The TimeSeries class has already been added to the sources, and built into the libStatsLib.a static library (see Chapter 5). Expose the MovingAverage function from the TimeSeries class. In this case, we just want to expose a procedural wrapper function. In a further exercise, we will add a class using RcppModules.

The steps required are as follows:
  • In the src directory add a new file TimeSeries.cpp. Use File ➤ New ➤ C++ File as this will create the file with the boilerplate Rcpp code.

  • #include the TimeSeries.h file from the Commoninclude directory.

  • Expose the MovingAverage method using a procedural wrapper. The following function signature is suggested:

std::vector<double> get_moving_average(Rcpp::NumericVector dates, Rcpp::NumericVector observations, int window) { ... }
  • Implement the code:
    • Convert the dates to a vector of longs.

    • Convert the observations to a vector of doubles.

    • Construct an instance of the TimeSeries class.

    • Call the MovingAverage function and return the results.

  • Select Build ➤ Clean and Rebuild and check that the build (still) works without warnings or errors. Check that the file srcTimeSeries.cpp compiled correctly in the output. Check that the function is present in RcppExports.R.

  • Check that the function is present in the list of functions. Use
    > library(pkgload)
    > names(pkg_env("StatsR"))
3) Add an R script TimeSeries.R to exercise the new function.
  • Create some random data as follows:
    n = 100                   # n samples
    observations <- 1:n + rnorm(n = n, mean = 0, sd = 10)
    dates <- c(1:n)
  • Add a simple moving average function with a default window size of 5:
    moving_average <- function(x, n = 5) {
      stats::filter(x, rep(1 / n, n), sides = 1)
    }
  • Obtain two moving averages: one from the StatsR package and one using the local function (note the window size parameter):
    my_moving_average_1 <- StatsR::get_moving_average(dates, observations, 5)
    my_moving_average_2 <- moving_average(observations, 5)   # Apply user-defined function
  • Plot the series.

  • Compare the series as they should be identical:
    equal <- (my_moving_average_1 - my_moving_average_2) >= (tolerance - 0.5)
    length(equal[TRUE])
4) Add procedural wrappers for the three z-test functions. These should be similar to the t-test wrappers, that is:
z_test_summary_data(...)
z_test_one_sample(...)
z_test_two_sample(...)
  • Select Build ➤ Clean and Rebuild and check that the build works without warnings or errors. Check that the file srcStatisticalTests.cpp compiled correctly in the output. Check that the functions are present in RcppExports.R. Check that the functions are present in the list of functions.

  • Use the R script StatisticalTests.R to write a script to exercise the new functions. The following script uses the same data as is used in the native C++ unit tests, the C# unit tests, and the Excel worksheet:
    #
    # z-tests
    #
    # Summary data z-test
    StatsR::z_test_summary_data(5, 6.7, 7.1, 29)
    # One-sample z-test data
    StatsR::z_test_one_sample(3, c(3, 7, 11, 0, 7, 0, 4, 5, 6, 2))
    # Two-sample z-test data
    x <- c( 7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8 )
    y <- c( 4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5 )
    StatsR::z_test_two_sample(x, y)
  • For completeness, add test cases to estthat est_statistical_tests.R.

  • Run the testthat.R script and confirm that all the tests pass.

5) Under the StatsR project, in the man directory, there is an R markdown document named StatsR-package.Rd. Update the document with the new functions: get_moving_average, z_test_summary_data, z_test_one_sample, and z_test_two_sample.
  • Select Preview to view the changes. Select Build ➤ Clean and Rebuild. Check the file: “D:RR-4.0.3libraryStatsRhtmlStatsR-package.html”.

6) Add the ZTest as a class to the RcppModule StatsTests.
  • In StatisticalTests.cpp, write a wrapper class that contains a private member variable:

Stats::ZTest _ztest;
  • Implement the conversions required in the constructors. This is basically identical to the TTest wrapper.

  • Add this class to the RcppModule:
    ...
    {
    Rcpp::class_<ZTest>("ZTest")
    .constructor<double, double, double, double>("Perform a z-test from summary input data")
    .constructor<double, Rcpp::NumericVector >("Perform a one-sample z-test with known population mean")
    .constructor<Rcpp::List >("Perform a two-sample z-test")
    .method("Perform", &ZTest::Perform, "Perform the required test")
    .method("Results", &ZTest::Results, "Retrieve the test results")
      ;
    }
  • In RStudio, select Build ➤ Clean and Rebuild and check that the build works without warnings or errors. Check that the file srcStatisticalTests.cpp compiled correctly in the output.

  • Use the R script StatisticalTests.R to write a script to exercise the new class. The following is an example of the summary data z-test:
    library(Rcpp)
    library(formattable)
    moduleStatsTests <- Module("StatsTests", PACKAGE="StatsR")
    ztest0 <- new(moduleStatsTests$ZTest, 5, 6.7, 7.1, 29)
    if(ztest0$Perform())
    {
      results <- ztest0$Results()
      print(results)
      results <- as.data.frame(results)
      formattable(results)
    }
    else
    {
      print("Z-test from summary data failed.")
    }
7) Add the TimeSeries as a class to a new RcppModule.
  • Open TimeSeries.cpp source file.

  • Add a wrapper class for the native C++ time series as follows:
    // A wrapper class for time series
    class TimeSeries
    {
    public:
      ~TimeSeries() = default;
      TimeSeries(Rcpp::NumericVector dates, Rcpp::NumericVector observations)
        : _ts(Rcpp::as<std::vector<long> >(dates), Rcpp::as<std::vector<double> >(observations) )
      {}
      std::vector<double> MovingAverage(int window) {
        return _ts.MovingAverage(window);
      }
    private:
      Stats::TimeSeries _ts;
    };
  • Define an RCPP_MODULE(TS) that describes the wrapper class, for example:
    Rcpp::class_<TimeSeries>("TimeSeries")
      .constructor<Rcpp::NumericVector, Rcpp::NumericVector>("Construct a time series object")
      .method("MovingAverage", &TimeSeries::MovingAverage, "Calculate a moving average of size = window")
      ;
  • Select Build ➤ Clean and Rebuild and check that the build works without warnings or errors.

  • Open the file TimeSeries.R. Add code to the script that computes the same time series as previously and compares the results.
    moduleTS <- Module("TS", PACKAGE="StatsR")
    ts <- new(moduleTS$TimeSeries, dates, observations)
    my_moving_average_4 <- ts$MovingAverage(5)
    equal <- (my_moving_average_4 - my_moving_average_2) >= (tolerance - 0.5)
    length(equal[TRUE])
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.106.225