Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Matt Wiley and Joshua F. Wiley, Advanced R, 10.1007/978-1-4842-2077-1_5

5. Writing Classes and Methods

Matt Wiley¹ and Joshua F. Wiley¹

(1)Elkhart Group Ltd. & Victoria College, Columbia City, Indiana, USA

It is often helpful to have a function behave differently depending on the type of object passed. For example, when summarizing a variable, it makes sense to create a different summary for numeric or string data. It is possible to have a different function for every type of object, but then users would have to remember many function names, and to remain unique, function names may be longer. Object-oriented programming (OOP) is based on objects and is implemented in R (as in most programming languages) by using two concepts: classes and methods. A class defines a template, or blueprint, describing the variables and features of an object as well as determining what methods work for it. For example, a house may be defined as having a floor, four walls, a roof, and a door. Specific data represents these properties, such as the dimensions and color of each wall. The methods are behaviors or actions that can be performed on a particular object type. For instance, a house can be painted, which changes its color, but a house cannot be eaten. R has three object-oriented systems: S3, S4, and R5. This chapter covers the S3 and S4 systems, which are the most common.

In this chapter, we use the ggplot2 R package (Wickham, 2009). The following code loads the checkpoint (Microsoft Corporation, 2016) package to control the exact version of R packages used and then loads the ggplot2 package:

## load checkpoint and required packages
library(checkpoint)            
checkpoint("2016-09-04", R.version = "3.3.1")            
library(ggplot2)            
options(width = 70) # only 70 characters per line

S3 System

The S3 system is the most common object-oriented system. It is also the easiest to start using and the simplest of the systems. The S3 system is easy to use in part because it is quite informal, and mostly focused on the functions or methods. These advantages are also limitations, as the S3 system provides no formal framework for ensuring that objects meet the requirements for a class. For a more in-depth guide to R programming using the S3 system, see S Programming by W.N. Venables and B.D. Ripley (Springer, 2004).

S3 Classes

In R, some types, or classes, of objects are available by default, and almost all can be thought of as vectors or generic vectors . For example, matrices and arrays are essentially vectors with attributes indicating the dimensions. Lists are generic vectors, in which each element of the vector may contain another vector. Data frames are lists in which each item is a vector, but all with equal length, and thus have a tabular format. Vectors can hold specific types of data, such as logical, integer, numeric (real numbers), or character strings. These fundamental objects and classes are provided by the base package. S3 classes are created by building on the objects and classes provided by the base package.

Note

S3 classes are created by using regular R objects (for example, vectors or lists) and classes (for example, logical or numeric). S3 classes are defined by setting the class name via class(object) <- "class name". Elements of S3 objects are typically accessed by using `$`, `[`, or `[[`.

The practical creation of S3 classes is simple. First, create an R object that meets the requirements or characteristics of the class, and then define the S3 classes by labeling the object with the class name. Because S3 classes differ from other R objects only in having special names or attributes, they are accessed and manipulated the same way as other basic R objects, by using `$`, `[`, and `[[`. To check the class of an object, we can use the class() function. In the following example, the mtcars object is queried to determine that it has a data frame class:

class(mtcars)                
[1] "data.frame"

To see how the class of an object is set, we can look at many of the functions from base R, such as table(). Here, we print the source code for the table() function, leaving some of the middle off to save space:

table                
function (..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no",
    "ifany", "always"), dnn = list.names(...), deparse.level = 1)
{
[ommitted for space]                
    y <- array(tabulate(bin, pd), dims, dimnames = dn)
    class(y) <- "table"
    y
}
<bytecode: 0x0000000017fa8180>
<environment: namespace:base>

The object returned at the end, y, is an array, but has a table class. The object class is set using the idiom class(object) <- "class name". It is also possible to assign more than one class to an object. This is done in the much the same fashion as assigning a single class and in order of preference. The following example stores the results from calling table() in the object x, and then sets two classes, first newclass and then the original table class:

x <- table(mtcars$cyl)                
class(x) <- c("newclass", "table")                
class(x)                
[1] "newclass" "table"

The value of assigning multiple classes is a sort of backup, primarily for methods. For example, if you create a new class that is a variant of a table or data frame, you may write a dedicated method for printing, but rely on methods the original class for other functions, such as plotting or summaries.

The types of classes you can write are virtually endless. A simple way to start is by creating special or augmented cases of existing classes. In the following example, we make a special case of a data frame, with x and y coordinates and text labels, called textplot. The simplest way to “create” the class is to create a data frame and then change its class. If we use only our new class label, calling the print() function results in the default method, but if we label it with both our new textplot label and as a data frame secondly, then calling print() falls back to the method for data frames :

d <- data.frame(                  
  x = c(1, 3, 5),                  
  y = c(1, 2, 4),                  
  labels = c("First", "Second", "Third")                  
)                  
class(d) <- "textplot"                  

print(d)                  
$x
[1] 1 3 5

$y
[1] 1 2 4

$labels
[1] First  Second Third
Levels: First Second Third

attr(,"row.names")
[1] 1 2 3
attr(,"class")
[1] "textplot"

class(d) <- c("textplot", "data.frame")                  

print(d)                  
  x y labels
1 1 1  First
2 3 2 Second
3 5 4  Third

In the S3 system, there is no formal way to create an object from a particular class. However, the most common way to create objects of a specific class is as the output from a function. Dedicated functions can be written to create an object of a specific class, or functions can be written that perform operations and output results as an object of a specific class. The latter approach is more typical in R when using the S3 system.

When you are creating a new class, some of the desired features or elements may be present in an existing class. If this is the case, it may make sense to build on, or extend, an existing class. For instance, data frames build on lists, requiring that each element of the list be a vector with the same length. Likewise, the textplot class we created previously builds on data frames, requiring three elements named x, y, and labels. In this instance, we would say that textplot inherits from data.frame. That is, textplot is a child of the parent class, data.frame (note that parent classes are also referred to as the super class, or base class). If a class inherits from only one other class, it is called single inheritance(that is, it has only one parent class). If a class inherits from multiple classes, it is called multiple inheritance(that is, it has more than one parent class). If a class inherits from another class, it may inherit features of the data stored, and it may inherit methods, the functions that operate on objects of a specific class. Inheriting methods are especially useful to avoid re-creating the wheel.

Note

Inheritance refers to creating a new class by building on the features and methods of an existing class. The new class is known as the child class, and the classes from which the new class is derived are the parent, or super, classes. A child class may inherit both features of the parent class and use of the parent class’s methods.

In the next example, we write a little function to use the formula interface to build a textplot object from a data frame. The function has two arguments: the formula, called f, and the data, called d. The first part of the code uses the stopifnot() function introduced in Chapter 4 to check that the classes of the objects passed to the function match what is expected. To test the classes, we use the inherits() function, which assesses whether a particular object is, or inherits from, the specified class. For example, our textplot class is secondly a data frame, and so testing that uses inherits(object, "data.frame") would evaluate to TRUE. When we build S4 classes, we see a more formal definition and system for class inheritance. The next part of the function gets all the variables from the formula, in order, from the specified data frame, renames the columns, applies our textplot class, and returns the object:

textplot_data <- functions(f, d) {                  
  stopifnot(inherits(d, "data.frame"))                  
  stopifnot(inherits(f, "formula"))                  

  newdata <- get_all_vars(formula = f, data = d)                  
  colnames(newdata) <- c("y", "x", "labels")                  
  class(newdata) <- c("textplot", "data.frame")                  

  return(newdata)                  
}                  

## example use                  
textplot_data(f = mpg ∼ hp | cyl, d = mtcars[1:10, ])                  
                     y   x labels
Mazda RX4         21.0 110      6
Mazda RX4 Wag     21.0 110      6
Datsun 710        22.8  93      4
Hornet 4 Drive    21.4 110      6
Hornet Sportabout 18.7 175      8
Valiant           18.1 105      6
Duster 360        14.3 245      8
Merc 240D         24.4  62      4
Merc 230          22.8  95      4
Merc 280          19.2 123      6

These examples show how easy it is to use the S3 system to create classes. Next, we explore how to write methods for existing or new classes.

S3 Methods

Methods are functions or operations that can be performed on objects of specific classes. Even if you do not write your own classes, you may write your own methods. Writing S3 methods is like writing functions as we did in Chapter 4. The only difference is that S3 methods have a special naming convention and require a generic function for users, which takes care of dispatching to the appropriate method.

Note

S3 methods are regular R functions that follow a specific naming convention: foo.classname(). Users call the generic function, foo(), which dispatches to the appropriate method based on the class of object passed in as an argument. If no generic function exists, a generic must be written that includes a call to UseMethods(), which handles the actual method dispatch.

To start, let’s write a simple plotting method for the textplot object class we developed. To make an S3 plot method, we use the function name—here, plot—followed by the class name—function.classname() or, in this case, plot.textplot(). As long as we name our function in that way, it works as an S3 method, as long as a generic plot() function exists, a topic we discuss shortly.

Our plot function is shown in the following code. The call to the par() function adjusts the default margins for a graph, to reduce excess white space on the top and right of our graph. The results are stored in the object, op, as this stores the original graphical parameters. Then the user’s original graphical parameter state can be restored when the function exits by calling par(op) on exit, a function you learned about in Chapter 4. Next, we create a new plot area by calling plot.new(), and set the dimensions of our new plot by calling plot.window() with the x and y limits determined by the range of the data. Next, we plot the labels by using the text() function, which takes the coordinates and labels. Finally, we add an axis on the bottom, side = 1, and left, side = 2. The original textplot data object is returned invisibly at the end.

plot.textplot <- function(d) {                  
  op <- par(mar = c(4, 4, 1, 1))                  
  on.exit(par(op))                  

  plot.new()                  
  plot.window(xlim = range(d$x, na.rm = TRUE),                  
              ylim = range(d$y, na.rm = TRUE))                  
  text(d$x, d$y, labels = d$labels)                  

  axis(side = 1, range(d$x, na.rm = TRUE))                  
  axis(side = 2, range(d$y, na.rm = TRUE))                  

  invisible(d)                  
}

Next, we need to make some data and then plot() it. Note that because it is a method, we do not need to call our function by its full name, plot.textplot(). We can simply call plot(), and R takes care of dispatching the data object to the correct method based on the object class, textplot. The result is shown in Figure 5-1, and the code is shown here:

Figure 5-1. The plot of text labels at specific coordinates, demonstrating the use of custom methods for custom classes

dat <- textplot_data(f = mpg ∼ hp | cyl, d = mtcars[1:10, ])                
plot(dat)

This example shows just how easy it is to use the S3 system. Almost no special functions or effort is required. Just create a regular R object however you want, add a custom class label, write a function, and give it a special name—and that is essentially all that is required. However, there are a few special functions and tools for S3 methods. To start, let us see what happens when we look at the source code for plot(). This generic function contains just three arguments. The body consists only of a call to UseMethod(), which is what tells R to check the argument classes and dispatch to an appropriate method. If you are writing methods and a generic function does not exist, you also need to write the generic function. Writing a generic function in the S3 system is a straightforward task that requires only considering the default arguments to include:

plot                
function (x, y, ...)
UseMethod("plot")
<bytecode: 0x0000000017ee3bd0>
<environment: namespace:graphics>

To see the methods available, we use the methods() function . The result shows many specific methods available for plot(), including our newly written plot.textplot:

methods(plot)                
 [1] plot,ANY-method               plot,color-method            
[omitted for space]
 [71] plot.table*                   plot.textplot                
[omitted for space]
 [81] plot.varclus                  plot.xyVector*               
see '?methods' for accessing help and source code

Some of the functions are followed by an asterisk, indicating that these methods are not public or cannot be directly accessed, such as plot.table(). For example, if we type its name, we get an error that it cannot be found:

plot.table                
Error: object 'plot.table' not found

Note

The :: operator is used to refer to publicly exported functions from a specific package, primarily when multiple packages export functions with the same name: package::foo(). The ::: operator is used to access functions from a package’s namespace that are not exported. Use nonpublic functions with caution, as they are subject to change without notice.

Functions that are not public or have not been exported from a package namespace can still be accessed as methods. These function also can be accessed directly by specifying the package they are from and using the ::: operator , revealing the function source code:

graphics:::plot.table                
function (x, type = "h", ylim = c(0, max(x)), lwd = 2, xlab = NULL,
    ylab = NULL, frame.plot = is.num, ...)
{
    xnam <- deparse(substitute(x))
    rnk <- length(dim(x))
    if (rnk == 0L)
        stop("invalid table 'x'")
    if (rnk == 1L) {
        dn <- dimnames(x)
        nx <- dn[[1L]]
        if (is.null(xlab))
            xlab <- names(dn)
        if (is.null(xlab))
            xlab <- ""
        if (is.null(ylab))
            ylab <- xnam
        is.num <- suppressWarnings(!any(is.na(xx <- as.numeric(nx))))
        x0 <- if (is.num)
            xx
        else seq_along(x)
        plot(x0, unclass(x), type = type, ylim = ylim, xlab = xlab,
            ylab = ylab, frame.plot = frame.plot, lwd = lwd,
            ..., xaxt = "n")
        localaxis <- function(..., col, bg, pch, cex, lty) axis(...)
        if (!identical(list(...)$axes, FALSE))
            localaxis(1, at = x0, labels = nx, ...)
    }
    else {
        if (length(dots <- list(...)) && !is.null(dots$main))
            mosaicplot(x, xlab = xlab, ylab = ylab, ...)
        else mosaicplot(x, xlab = xlab, ylab = ylab, main = xnam,
            ...)
    }
}
<bytecode: 0x0000000053765058>
<environment: namespace:graphics>

In addition to writing methods for new classes, it is sometimes helpful to write methods for existing classes. For example, the popular ggplot2 package has no default method for working with a linear model or regression objects. To start, we set up a simple regression model by using the built-in mtcars data . From there, we are predicting mpg from hp, vs, their interaction (all of which are created from hp * vs, which expands to the two main effects and their interaction or product term), and cyl dummy coded, through the call to factor(). The results are shown here by calling summary()on the object:

m <- lm(mpg ∼ hp * vs + factor(cyl), data = mtcars)                  
summary(m)                  
Call:
lm(formula = mpg ∼ hp * vs + factor(cyl), data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max
-4.7640 -1.4424 -0.1703  1.5882  6.9382

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  26.92908    2.71758   9.909 2.56e-10 ***
hp           -0.01519    0.01554  -0.978  0.33718    
vs            8.53352    4.95297   1.723  0.09678 .  
factor(cyl)6 -4.21121    1.93887  -2.172  0.03916 *  
factor(cyl)8 -8.65096    2.69738  -3.207  0.00354 **
hp:vs        -0.09101    0.04363  -2.086  0.04692 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.007 on 26 degrees of freedom
Multiple R-squared:  0.7913,  Adjusted R-squared:  0.7511
F-statistic: 19.71 on 5 and 26 DF,  p-value: 4.181e-08

We can check the class of the object and that there is no current ggplot() method by using the following code:

class(m)                
[1] "lm"
methods(ggplot)                
[1] ggplot.data.frame* ggplot.default*    ggplot.summaryP   
[4] ggplot.transcan   
see '?methods' for accessing help and source code

Although the fortify() function has a method for linear models that creates a data frame suitable for use with ggplot(), it extracts only the raw data, fitted values, and residuals. To show the effects of a model, it can be helpful to plot predicted values as a function of one variable holding other variables at specific values. The method that follows implements a system to do this.

It takes the same basic arguments as ggplot()but adds the vars argument, and each variable used in the model is passed with a specific set of values to hold it at for prediction. All possible combinations of these are created by passing the list as arguments to the expand.grid() function by using the do.call() function. The dependent variable, or yvar, is then extracted from the model formula and converted to a character. New predictions along with standard errors for the predictions are generated. A 95 percent confidence interval is generated with the lower limit, LL, and upper limit, UL, based on the fit or predicted value and the normal quantiles times the standard error of the fit. Finally, the column name is changed from fit to whatever the dependent variable’s actual name was, and then the data for prediction and the predicted values are combined into a new data frame. This is passed to ggplot(), which dispatches to the ggplot() method for data frames.

ggplot.lm <- function(data, mapping, vars, ...) {                
  newdat <- do.call(expand.grid, vars)                
  yvar <- as.character(formula(data)[[2]])                
  d <- as.data.frame(predict(data, newdata = newdat, se.fit = TRUE))                
  d <- within(d, {                
    LL <- fit + qnorm(.025) * se.fit                
    UL <- fit + qnorm(.975) * se.fit                
  })                
  colnames(d)[1] <- yvar                
  data <- cbind(newdat, d[, c(yvar, "LL", "UL")])                
  ggplot(data = data, mapping = mapping, ...)                
}

With this method in place, we can easily make some graphs from our linear regression model . The following code uses our new method, specifying the exact values to hold predictor variables, adds a line by using geom_line(), and uses a black-and-white theme with theme_bw(). The result is shown in Figure 5-2; the code is as follows:

Figure 5-2. Predicted regression line from the model, using the ggplot.lm() method

ggplot(m, aes(hp, mpg), vars = list(                
                          hp = min(mtcars$hp):max(mtcars$hp),                
                          vs = mean(mtcars$vs),                
                          cyl = 8)) +                
  geom_line(size=2) +                
  theme_bw()

Because all possible combinations of the predictor values are created by using expand.grid()in our method, we can make several predicted lines, such as holding vs at 0 and 1, shown in the following example code and in Figure 5-3:

Figure 5-3. Predicted regression lines from the model, with confidence intervals, using the ggplot.lm() method

ggplot(m, aes(hp, mpg, linetype = factor(vs), group = factor(vs)), vars = list(                
                          hp = min(mtcars$hp):max(mtcars$hp),                
                          vs = c(0, 1),                
                          cyl = 8)) +                
  geom_ribbon(aes(ymin = LL, ymax = UL), alpha = .25) +                
  geom_line(size=2) +                
  theme_bw()

These examples provide a simple introduction to what can be done when writing new classes and methods or extending existing classes and methods. Because the S3 system is the most commonly used in R, it may be the single most important system to learn; it can be used so widely and can extend many classes of objects.

S4 System

In contrast with the S3 system, the S4 system is a formal system. The benefit of this formality is a greater assurance that objects of a particular class contain exactly what is expected. The downside of the formality is a greater complexity: more functions required to set up classes and methods, and more-rigid requirements in the programming. Whereas the S3 system allows for writing quick-and-dirty code, the S4 system requires more careful planning and comes with higher overhead.

Objects have three components: classes, slots, and methods. The name and structure of the object—what it contains—is the class. The variables or other objects stored in the object are slots. Finally, the functions that can operate on an object are its methods. Throughout this section, we go over each of these components. For further reading on the S4 system, one excellent guide is Software for Data Analysis: Programming with R by John Chambers (Springer, 2010).

S4 Classes

Because S4 classes are more formal, planning is required before writing a new class. Unlike S3 classes, in S4 the names and types of every variable to be included must be specified. Previously we defined a class, textplot, by using the S3 system. We can define the same class by using the S4 system. The x and y arguments should both be numeric type, and the labels argument should be a character string. We also know that the length of all the arguments should be the same. Finally, we may want to assert that although it is okay to have a missing label, a blank label is not allowed. All of this needs to be considered and specified in advance—because although in the S3 system none of this could be controlled, in the S4 system we can define everything when we create the class. To create our first S4 class, we use the setClass() function , shown in the next example. Not every argument is required, but it is considered a good practice to be as explicit as possible when defining a new class.

Note

S4 classes are created by calling setClass(). The Class and slots arguments are required. The class name is specified as a character string, and slots (holding variables) are defined as a named character vector; names correspond to slot names, and values correspond to the class of each slot. An “empty” object can be specified by using the prototype argument, and validity checking can be set by passing a function to the validity argument.

The first argument, Class, provides the name of the class. Next, the slots are defined by using a named vector in which the names indicate the slot names and the values indicate the type of each slot. The prototype argument is not required, but it is helpful to define, as it determines how to create an “empty” object, or how an object of that class is created when no specific data is specified. The validity argument is also not required, but allows explicit checks to be run that the object conforms to expectations. R by default ensures the appropriate type of objects is passed to the slots, but many other tests and validity checks can be added to reduce the chances of an object being created that does not work as intended. Here is the setClass()example:

setClass(                
  Class = "textplot",                
  slots = c(                
    x = "numeric",                
    y = "numeric",                
    labels = "character"),                
  prototype = list(                
    x = numeric(0),                
    y = numeric(0),                
    labels = character(0)),                
  validity = function(object) {                
    stopifnot(                
      length(object@x) == length(object@y),                
      length(object@x) == length(object@labels))                
    if (!all(nchar(object@labels) > 0, na.rm = TRUE)) {                
      stop("All labels must be missing or non zero length characters")                
    }                
    return(TRUE)                
  }                
)

To create new objects of a particular class in the S4 system, we use the function new(). In the code that follows, we examine three attempts to create a new object. First, we make a correct and valid object. Then we look at what happens if we try to use the wrong type of argument. Finally, we look at an example that tests the validity function we created.

new("textplot",                  
    x = c(1, 3, 5),                  
    y = c(1, 2, 4),                  
    labels = c("First", "Second", "Third"))                  

An object of class "textplot"
Slot "x":
[1] 1 3 5

Slot "y":
[1] 1 2 4

Slot "labels":
[1] "First"  "Second" "Third"

new("textplot",                  
    x = c(1, 3, 5),                  
    y = c(1, 2, 4),                  
    labels = 1:3)                  

Error in validObject(.Object) :
  invalid class "textplot" object: invalid object for slot "labels" in class "textplot": got class "integer", should be or extend class "character"

new("textplot",                  
    x = c(1, 3, 5),                  
    y = c(1, 2, 4),                  
    labels = c("First", "Second", ""))                  

Error in validityMethod(object) (from #16) :
  All labels must be missing or non zero length characters

These errors would not happen if we were using the S3 system to define a class, as no such type checking nor validity checking occur. In the previous examples, each attempt to create a new object had at most one error. The next example has two errors: the vector passed to the y slot is not the same length as the other two, and there is a zero-length character label. However, as currently written, only the first error is caught, because the validity function stops as soon as there are any problems:

new("textplot",                
    x = c(1, 3, 5),                
    y = c(1, 2),                
    labels = c("First", "Second", ""))                
Error: length(object@x) == length(object@y) is not TRUE

Another way to write the validity function is so that, rather than throwing errors for any problems, the function collects them and returns all at the end. While revising the validity function, we could also think about how to make the error messages more informative. As it stands, it is fairly straightforward to see what the problem is, but perhaps not to see exactly what the problem is. For example, it is evident that the lengths are not equal, but is it that x is too long or y is too short? This is implemented in the revised code to define a class.

However, before diving into improving the validity function , we need a brief diversion on creating and formatting character strings in R. First, when writing text, new lines can be inserted by including the special character, . To see the examples, we use the cat() function, which stands for concatenate, and print, and writes text out to the R console (or other locations if a file is specified). The following two examples are identical except for the line break between the a and b in the second example:

cat("ab", fill = TRUE)                  
ab

cat("a
b", fill = TRUE)                  
a
b

One commonly used function is paste(), which can combine vectors or collapse them. Combining is shown in the first example that follows. The two vectors are combined by using the separator, defined by the sep argument, an empty string in our case. In the second example, a single vector with multiple elements is collapsed into a single character string. How the elements of the vectors are combined into one string is determined by the argument to collapse—in our example, the line break character.

paste(c("a", "b"), c(1, 2), sep = "")                  
[1] "a1" "b2"

paste(c("a", "b"), collapse = "
")                  
[1] "a
b"

These are useful functions for us when writing a validity-checking function, as they allow us to combine multiple errors into one string with line breaks as needed.

The other key function we use is sprintf(). Its first argument is a user-defined string, with special symbols that always start with the percentage sign (%) where values should be substituted. The subsequent arguments are the values to substitute. An example may be the clearest way to show it. Here, %d is used to indicate that an integer is substituted, and then R substitutes in 98, 80, and 75. The order of substitution is the order of appearance.

sprintf("First (%d), Second (%d), Third (%d)", 98, 80, 75)                
[1] "First (98), Second (80), Third (75)"

Commonly used format options for substitutions are %d for integers, %f for fixed-point decimals, %s for strings, and % for a literal percentage sign. Each is demonstrated in this next example, and further documentation is available in the help pages, ?sprintf. For the numeric value, we use 0.2 to specify that the number should be rounded to two decimal places.

sprintf("Integer %d, Numeric %0.2f, String %s, They won by 58%%",t                
        5, 3.141593, "some text")                
[1] "Integer 5, Numeric 3.14, String some text, They won by 58%"

Armed with paste() and sprintf(), we can proceed to revise the validity function for our textplot class to provide more-informative errors, and to run all checks, collecting errors along the way and returning all of them at the end. One final note: previously, we used the idiom new("classname", arguments) to create a new object of a particular class. While this is perfectly acceptable, there is a shortcut. The function setClass() is primarily called for its side effect of defining a new class, but it also invisibly returns a constructor function. If we save the results from our call to setClass(), by convention in an object with the same name as the class, we can use the resulting object to create a new object of that class:

textplot <- setClass(                  
  Class = "textplot",                  
  slots = c(                  
    x = "numeric",                  
    y = "numeric",                  
    labels = "character"),                  
  prototype = list(                  
    x = numeric(0),                  
    y = numeric(0),                  
    labels = character(0)),                  
  validity = function(object) {                  
    errors <- character()                  
    if (length(object@x) != length(object@y)) {                  
      errors <- c(errors,                  
                  sprintf("x (length %d) and y (length %d) are not equal",                  
                          length(object@x), length(object@y)))                  
    }                  
    if (length(object@x) != length(object@labels)) {                  
      errors <- c(errors,                  
                  sprintf("x (length %d) and labels (length %d) are not equal",                  
                          length(object@x), length(object@labels)))                  
    }                  
    if (!all(nchar(object@labels) > 0, na.rm = TRUE)) {                  
      errors <- c(errors, sprintf(                  
        "%d label(s) are zero length. All labels must be missing or non zero length",                  
        sum(nchar(object@labels) == 0, na.rm = TRUE)))                  
    }                  

    if (length(errors)) {                  
      stop(paste(c("
", errors), collapse = "
"))                  
    } else {                  
      return(TRUE)                  
    }                  
  }                  
)

Now when we create the same object with multiple problems, we get far more information and save some keystrokes. We can see the lengths of x and y, and also learn that there are problems with the labels:

textplot(                  
  x = c(1, 3, 5),                  
  y = c(1, 2),                  
  labels = c("First", "Second", ""))                  
Error in validityMethod(object) (from #30) :

x (length 3) and y (length 2) are not equal
1 label(s) are zero length. All labels must be missing or non zero length

S4 Class Inheritance

So far, you have seen how to define new S4 classes. It may still seem like using the S4 system requires much more work than the S3 system, and with little benefit, aside from more formal validation and error checking. One of the powerful features of the S4 system is inheritance.

Note

S4 classes can inherit from existing S4 classes by calling setClass() with the additional argument contains = "S4 Class to Inherit". Existing slots, prototype, and validity checking are inherited. Only new slots and corresponding prototypes/validity checking need to be specified in the new setClass() call.

We previously created a simple textplot class. Now suppose that although sometimes our simple class is sufficient, at other times we may need more. For example, at times we may want to drill down into the data and create a panel of plots for different subsets of the data by some grouping variable. To this end, we want a groupedtextplot class. However, the only additional data we need is one more slot. One option is to copy and paste our old code and then modify it as needed. In this section, we explore how using inheritance lets us reuse and extend existing classes. When a class inherits from another class, all of the slots from the previous class are also inherited, as is validity checking. Next, we create our groupedtextplot class. It is similar to creating a new class, but we define only new slots, and then specify the inheritance by using the argument contains. Because the validity checking is also inherited, we need to specify only validity checks for the new slots.

groupedtextplot <- setClass(                
  Class = "groupedtextplot",                
  slots = c(                
    group = "factor"),                
  prototype = list(                
    group = factor()),                
  contains = "textplot",                
  validity = function(object) {                
    if (length(object@x) != length(object@group)) {                
      stop(sprintf("x (length %d) and group (length %d) are not equal",                
                   length(object@x), length(object@group)))                
    }                
    return(TRUE)                
  }                
)

With that small amount of code, we are ready to use our new class. In the following two examples, the first shows a correctly created new object, and the latter shows the familiar error messages when we attempt to create an invalid object:

gdat <- groupedtextplot(                  
    group = factor(c(1, 1, 1, 1, 2, 2, 2, 2)),                  
    x = 1:8,                  
    y = c(1, 3, 4, 2, 6, 8, 7, 10),                  
    labels = letters[1:8])                  
gdat                  
An object of class "groupedtextplot"
Slot "group":
[1] 1 1 1 1 2 2 2 2
Levels: 1 2

Slot "x":
[1] 1 2 3 4 5 6 7 8

Slot "y":
[1]  1  3  4  2  6  8  7 10

Slot "labels":
[1] "a" "b" "c" "d" "e" "f" "g" "h"

groupedtextplot(                  
    group = factor(c(1, 1, 1, 1, 2, 2, 2, 2)),                  
    x = 1:8,                  
    y = c(1, 3, 4, 2, 6, 8, 7),                  
    labels = c(letters[1:7], ""))                  
Error in validityMethod(as(object, superClass)) (from #30) :

x (length 8) and y (length 7) are not equal
1 label(s) are zero length. All labels must be missing or non zero length

In this case, textplot would be called the parent class, and groupedtextplot would be called the child class. This relationship can be diagrammed (and sometimes for complex inheritance, diagramming is helpful). By convention, the relationship is shown graphically with an arrow pointing from the child to the parent(s), such as textplot <- groupedtextplot. Also by convention, parents are typically on the left, or above if graphing from top to bottom. Although we cover inheritance from only a single parent in this book, classes can inherit from multiple parents, and those parents can inherit from parents, and so on. It is in these cases where a visual diagram is particularly helpful. Another benefit of using inheritance, rather than writing a whole new class, is that methods are also inherited. This means that we can reuse both the slots from the parent as well as the methods written for the parent class, a topic we turn to in the next section.

S4 Methods

In addition to the methods() function you saw earlier to display the methods available for a given function, in the S4 system we can find methods by using showMethods(). Because of the more formal class system, showMethods() can be used to show all methods for a specific function, or to show all methods (for any function) for a particular class by using the classes = "class name" argument. This can be helpful if you are working with a new class and want to know what methods have already been written and are available.

Note

Available S4 methods can be examined by using showMethods("generic function"). New S4 methods are defined by calling setMethod(). The main three arguments are f, signature, and definition, containing the name of the generic function (string), the S4 class name that will dispatch to this method (string), and a function that is the actual method, respectively.

To write new methods in the S3 system, we simply write functions with a special naming convention. To define S4 methods, we use the function setMethod(). It is possible to write new methods for existing classes, as well as writing methods for new classes, of course. For a new class, a method is needed for show(). When an object is simply typed at the console, R shows it by calling show(). Without a show() method for a new class, the default printing is quite ugly, as you have seen in our example so far.

In the following code, we define a new method for our textplot class. The first argument is the function name for which we create a method. The next argument, the signature, is the name of the class. In this case, because show() takes a single argument, only one class name needs to be specified. For functions with multiple arguments, the signature can become more complex, with different methods depending on the class of multiple arguments. Finally, we write our function, the definition of the method. The code is relatively simple, using the cat() function to display the values. The argument fill = TRUEhas the effect of adding a line feed so that each line starts with X: or the variable label, and then the values. The head() function is used to get at most the first five values, or less if there are fewer values.

setMethod(                
  f = "show",                
  signature = "textplot",                
  definition = function(object) {                
    cat("     X: ")                
    cat(head(object@x, 5), fill = TRUE)                
    cat("     Y: ")                
    cat(head(object@y, 5), fill = TRUE)                
    cat("Labels: ")                
    cat(head(object@labels, 5), fill = TRUE)                
  })                
[1] "show"

R echoes the name of the generic function, show(), for which we just created a method. Now we get some nicer output when we create a textplot class object:

dat <- textplot(                  
  x = 1:4,                  
  y = c(1, 3, 5, 2),                  
  labels = letters[1:4])                  

dat                  
     X: 1 2 3 4
     Y: 1 3 5 2
Labels: a b c d

Once we start defining methods, the benefits of class inheritance become even greater. Because groupedtextplot inherits from textplot, if no method is defined for groupedtextplot, it falls back to the method for textplot, if available. Although not perfect, because the grouping is not shown, this is still nicer than the default:

gdat                
     X: 1 2 3 4 5
     Y: 1 3 4 2 6
Labels: a b c d e

Next, we define a method for the [ function, which is used to subset data. This function is more complex, as we must build a logic tree allowing several possible combinations of arguments. The first argument of the function, x, is for the object to be subset. By convention, i refers to rows or observations, and j to columns or variables. If only i is not missing (that is, rows or observations are specified), typically all variables are included. If only j(variables) is specified, typically all observations are included. If both are specified, only select variables and select observations are included. This is accomplished by using a series of if and else if statements. There are three new functions:

validObject() is a generic function that executes the validity check, if present, for a specific object class.
slotNames() returns a character vector giving the names of each slot.
slot() works similarly to the @ operator, but can use character strings to extract slots by name.

After the method is set, several examples of its uses are shown:

setMethod(                  
  f = "[",                  
  signature = "textplot",                  
  definition = function(x, i, j, drop) {                  
    if (missing(i) & missing(j)) {                  
      out <- x                  
      validObject(out)                  
    } else if (!missing(i) & missing(j)) {                  
      out <- textplot(                  
        x = x@x[i],                  
        y = x@y[i],                  
        labels = x@labels[i])                  
      validObject(out)                  
    } else if (!missing(j)) {                  
      if (missing(i)) {                  
        i <- seq_along(x@x)                  
      }                  

      if (is.character(j)) {                  
        out <- lapply(j, function(n) {                  
          slot(x, n)[i]                  
        })                  
        names(out) <- j                  
      } else if (is.numeric(j)) {                  
        n <- slotNames(x)                  
        out <- lapply(j, function(k) {                  
          slot(x, n[j])[i]                  
        })                  
        names(out) <- n[j]                  
      } else {                  
        stop("j is not a valid type")                  
      }                  
    }                  

    return(out)                  
  })                  

dat[]                  
     X: 1 2 3 4
     Y: 1 3 5 2
Labels: a b c d

dat[i = 1:2]                  
     X: 1 2
     Y: 1 3
Labels: a b

dat[j = 1]                  
$x
[1] 1 2 3 4

dat[j = "y"]                  
$y
[1] 1 3 5 2

dat[i = 1:2, j = c("x", "y")]                  
$x
[1] 1 2

$y
[1] 1 3

With a show and subsetting method defined for our textplot class, we can easily leverage those to make a show method for our groupedtextplotclass by looping through the object by group, subsetting, and showing each subset:

setMethod(                  
  f = "show",                  
  signature = "groupedtextplot",                  
  definition = function(object) {                  
    n <- unique(object@group)                  
    i <- lapply(n, function(index) {                  
      cat("Group: ", index, fill = TRUE)                  
      show(object[which(object@group == index)])                  
    })                  
  })                  

gdat                  
Group:  1
     X: 1 2 3 4
     Y: 1 3 4 2
Labels: a b c d
Group:  2
     X: 5 6 7 8
     Y: 6 8 7 10
Labels: e f g h

Summary

This chapter has introduced the S3 and S4 systems in R for developing new classes and methods. The S4 system, in particular, can be complicated, with inheritance from multiple parent classes and methods that are specialized to the class of more than one argument. Even if you use and develop in R, you may only rarely develop new classes, as there are already classes for most data types. However, it can often be helpful to develop or extend existing methods, and at the very least, understanding how classes and methods work makes it easier to use existing ones.

Hopefully, this chapter is enough to get you started using these systems and to see their capabilities. Table 5-1 describes key functions covered in this chapter. In Chapter 6, we bundle functions and classes and methods together to make our own R package. The focus is on making an R package, so you do not need to be too comfortable with classes and methods. If your work would benefit from the use of the S4 system, you can get slightly more in-depth coverage from, “How S4 Methods Work” by John Chambers (available online at http://developer.r-project.org/howMethodsWork.pdf ). If you intend to work with the S4 system and need an in-depth dive, we recommend Software for Data Analysis: Programming with R. Another good book more focused on general R programming and the older S3 system is S Programming.

Table 5-1. Key Functions Described in This Chapter

Function	What It Does
class()	Returns the class of an object (S3/S4).
inherits()	Checks whether an object is of a certain class or inherits from that class.
methods()	Returns a list of the available methods for a given function.
showMethods()	Returns a list of the available methods for a given function or, if using the classes argument, available methods for any function for a given class.
:::	Operator that allows access to nonexported (nonpublic) functions from a package.
function.class()	Generic scheme for naming an S3 method, using the function name, followed by a period, followed by the class of object it should be applied to.
setClass()	Defines a new S4 class.
setMethod()	Defines a method for a particular function for a particular S4 class.
@	Low-level way to access slots by name in objects in the S4 system, similar to $ for other R objects or S3 class objects.
new()	Creates a new S4 object of a specific class.
paste()	Pastes strings together. Can operate on several vectors or collapse together all the elements of a single vector.
sprintf()	Formats strings and allows for substituting numbers and other strings into a defined template. Useful for making informative error messages or other messages to users.
show()	Generic function to show an R object. Also, the default function that is called when you type an object name at the R console.
`[`()	Operator/generic function used to subset data or to access specific variables or rows. In the S4 system for a new class, methods must be defined, or the function is not usable.
head()	Lists the first few elements or rows of an object.
validObject()	Generic function that checks when an object with an S4 class is valid by using the validity checks specified when the class was created (if any). Called by default when an object is created, but can also be called explicitly after modifying an object to check whether it is still valid.
slotNames()	Returns all the slot names of an S4 class object, similar to names() for other R objects or S3 class objects.
slot()	Can be used to access a specific slot of an S4 object.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Writing Classes and Methods

Create new playlist

Sign In

Sign Up

5. Writing Classes and Methods

S3 System

S3 Classes

Note

Note

S3 Methods

Note

Figure 5-1. The plot of text labels at specific coordinates, demonstrating the use of custom methods for custom classes

Note

Figure 5-2. Predicted regression line from the model, using the ggplot.lm() method

Figure 5-3. Predicted regression lines from the model, with confidence intervals, using the ggplot.lm() method

S4 System

S4 Classes

Note

S4 Class Inheritance

Note

S4 Methods

Note

Summary

Table 5-1. Key Functions Described in This Chapter

Table of Contents for
5. Writing Classes and Methods