Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Matt Wiley and Joshua F. Wiley, Advanced R, 10.1007/978-1-4842-2077-1_4

4. Writing Functions

Matt Wiley¹ and Joshua F. Wiley¹

(1)Elkhart Group Ltd. & Victoria College, Columbia City, Indiana, USA

Writing your own functions in R enables you to combine a set of R commands into a function that is easy to call and can be generalized. Functions are foundational to R. To become a more advanced user or developer of R, a good understanding of what functions are and how to write them is crucial. Broadly speaking, a function takes one or more inputs and processes them to produce and return output.

Not every programming task should be converted to a function. However, whenever you find yourself copying and pasting a particular line of your code for the third time, you should likely write a function. Another “no question about it” time to write a function is when your code is over 100 lines or so. Long chunks of code can become almost impossible to read through and understand. Instead, we write functions with good, descriptive names that make our code more readable. Then, on a separate pass, we write code using the functions. The beauty of such a system is that it becomes easier to write focused code inside each function that solves one particular part of your challenge. Another benefit of this approach is that, should greater efficiency ever be required, it is possible to determine which functions are costing you the most processing power or take the longest time to complete. Then research can be done on how to be more efficient in just that spot.

In this chapter, we use the Hmisc R package (Harrell Jr, 2016). The following code loads the checkpoint (Microsoft Corporation, 2016) package to control the exact version of R packages used and then loads the Hmisc package:## load checkpoint and required packages

## load checkpoint and required packages
library(checkpoint)            
checkpoint("2016-09-04", R.version = "3.3.1")            
library(Hmisc)            
options(width = 70) # only 70 characters per line

Components of a Function

With some exceptions, most functions in R have three components :

Formals: The arguments, or inputs, to the function
Body: The commands that process the input
Environment: The location or context of a function, which determines where it looks for variables

Each can be examined using dedicated functions. First, we write an example function:

## create new function, f()
f <- function(x, y = 5) {              
  x + y              
}

The function accepts two arguments : x and y. The first argument, x, has no default, but the second argument, y, defaults to a value of 5. Although we can see these easily (in part because we’ve written the function), the formals, or arguments, of the function can be examined using the formals() and args() functions:

formals(f)                
$x

$y
[1] 5

args(f)                
function (x, y = 5)
NULL

To see the actual R code or the commands used to process the formals, we use the body() function:

body(f)              
{
    x + y
}

The commands are always enclosed in opening and closing brackets, { }. In this case, the R code is simply x + y, but some functions have hundreds of lines. The last key part of a function is its environment. The environment of a function determines where it looks for variables or objects, which can include both data as well as functions. To see the environment of a function, we use the environment() function:

environment(f)              
<environment: R_GlobalEnv>

In this case, the function is in the global environment, where we created the function. For other functions, this would vary. For example, if we look at the environment for the install.packages() function, it is the namespace for the utils package:

environment(install.packages)              
<environment: namespace:utils>

This provides a common language for discussing R functions. In the remainder of the chapter, we delve deeper into writing functions, and the special code and tools available for use with functions.

Scoping

In R, scopingis what determines where to look for a particular variable. Consider, for example, when we type plot, how does R translate that code? Where does R look it up? Scope is R’s answer for the language idea of context. If I say, “I bought a new bat,” you would likely suppose I mean a cricket bat rather than a flying mammal. For R, context comes from the scope and environment. Thus, different environments can lead to different results.

Most aspects of writing R functions are no different than using R interactively. However, one difference is that functions have their own environment. Further, functions often are located in various environments. For instance, when using R interactively, almost all commands are executed from the global environment. In contrast, many functions are written as part of R packages, in which case the function’s environment is defined by the R package. Therefore, before jumping in to writing functions, it is helpful to understand scoping. Consider these two examples:

plot                
function (x, y, ...)
UseMethod("plot")
<bytecode: 0x00000000190621d0>
<environment: namespace:graphics>

plot <- 5                

plot                
[1] 5

In the first instance, R finds plot in the graphics package. In the second instance, R finds plot in the global environment. Note that assigning 5 to the variable, plot, does not overwrite the plot() function. Instead, it creates another R object with the same name as the function. After creating the new variable, when we type plot at the console, R returns the numeric value rather than the function because the assignment, plot <- 5, occurs in the global environment, which R searches before it checks the environments of different packages. The search() function returns the environments in the order that R searches them. Your environment may be different, although there are likely some similarities:

search()              
 [1] ".GlobalEnv"            "package:Hmisc"        
 [3] "package:ggplot2"       "package:Formula"      
 [5] "package:survival"      "package:lattice"      
 [7] "package:devEMF"        "package:checkpoint"   
 [9] "ESSR"                  "package:stats"        
[11] "package:graphics"      "package:grDevices"    
[13] "package:utils"         "package:datasets"     
[15] "package:RevoUtilsMath" "package:methods"      
[17] "Autoloads"             "package:base"

From the output, we see that R first looks in the global environment (.GlobalEnv), then in the package Hmisc , and so on until it reaches the base package. We can see the current environment by again using the environment() function. R always begins looking in the current or local environment. From there, it progresses to the parent environment. We can find the parent environment for a given environment by using the parent.env() function :

environment()                
<environment: R_GlobalEnv>

parent.env(.GlobalEnv)
<environment: package:Hmisc>                
attr(,"name")
[1] "package:Hmisc"
attr(,"path")
[1] "C:/Users/Authors/.checkpoint/2016-09-04/lib/x86_64-w64-mingw32/3.3.1/Hmisc"

Coming back to functions , each function has its local environment in addition to the function itself being in an environment. The following code shows the local function environment and its parent environment. We can roughly classify variables in functions into one of three types:

Formal variables
Local variables defined within the function
Free or other variables that are neither formals nor local variables

Each is demonstrated in turn with the following code:

a <- "free variable"                
f <- function(x = "formal variable") {                
  y <- "local variable"                

  e <- environment()                
  print(e)                
  print(parent.env(e))                

  print(a)                
  print(x)                
  print(y)                
}                

f()                
<environment: 0x0000000017c846b0>
<environment: R_GlobalEnv>
[1] "free variable"
[1] "formal variable"
[1] "local variable"

The variable x is a formal defined as its default value, y is a local variable defined in the body of the function, and a is a variable defined in the function’s parent environment. Although it is possible to rely on objects in the search path for a function rather than identified in the formals or as a local variable, it is not a wise idea. Coding to depend on a function’s parent environment or the search path can lead to particularly tricky bugs and unexpected behavior. This creates chaos for users or yourself later, when something in the environment seemingly unrelated to the function is changed, and even though it appears the function’s code and inputs have not changed, suddenly the output is different.

Although the examples so far have been relatively straightforward, scoping becomes trickier when using nested function calls. In the following code, it may be harder to predict the results of each piece of evaluated code. The next examples set up two functions and then use the functions with three different variables in the global environment:

f1 <- function(y = "f1 var") {                
  x <- y                
  a1 <- f2(x)                
  rm(x)                
  a2 <- f2(x)                
}                

f2 <- function(x) {                
  if (nchar(x) < 10) {                
    x <- "f2 local var"                
  }                
  print(x)                
  return(x)                
}                

x <- "global var"                
f1()                
[1] "f2 local var"
[1] "global var"

x <- "g var"                
f1()                
[1] "f2 local var"
[1] "f2 local var"

rm(x)                
f1()                
[1] "f2 local var"
Error in nchar(x) (from #2) : object 'x' not found

It is worth experimenting with scoping until it makes sense, because it is necessary for ensuring that the correct object is found. This can be a particular challenge when writing and developing a package or when using functions that appear in multiple packages.

Functions for Functions

In addition to the usual R code and functions that you may use within a function you write, some special functions exist. These are only, or primarily, used within other functions. Even if you are an experienced R user, these may be unfamiliar if you have not previously written functions.

The match.arg() function is useful for performing fuzzy matching of arguments. This also has the benefit that if an argument does not match one of the valid options (that is, is invalid), it throws an error. The following code shows two functions that are similar, except one uses the match.arg() function. The examples demonstrate fuzzy matching and what happens when an invalid argument is used:

f1 <- function(type = c("first", "second")) {                
  type                
}                
f2 <- function(type = c("first", "second")) {                
  type <- match.arg(type)                
  type                
}                

f1("fi")                
[1] "fi"
f2("fi")                
[1] "first"

f1("test")                
[1] "test"
f2("test")                
Error in match.arg(type) : 'arg' should be one of "first", "second"

Argument matching is also useful for ensuring that when a text string is passed, only a valid option is used, and that if it is not, an informative error is thrown. The following code expands on the function we built without match.arg()to calculate a mean if type = "first", and a standard deviation if type = "second". However, when an invalid option is passed to the type argument, we get the relatively cryptic error message that x is not found:

f1b <- function(type = c("first", "second")) {                
  if (type == "first") {                
    x <- mean(1:5)                
  } else if (type == "second") {                
    x <- sd(1:5)                
  }                
  return(x)                
}                

f1b("test")                
Error in f1b("test"): object 'x' not found

Another function specific to function arguments is missing(). It returns a logical value indicating whether a specific argument is missing from the function call. This is helpful when writing functions that may be used in different ways. The following is an example of a function that calculates Cohen’s d effect size, which for a single group is defined as the mean of a variable divided by the standard deviation. Cohen’s d can also be calculated for repeated measures, such as a group measured before and after an intervention. This is done by calculating the difference of the two variables, and then proceeding as before. We use the missing() function to determine whether the user passes a single variable, x, or two variables, x and y, so that our function can elegantly handle calculating Cohen’s d for both one sample and repeated-measures data:

cohend <- function(x, y) {                
  if (!missing(y)) {                
    x <- y - x                
  }                

  mean(x) / sd(x)                
}                

cohend(x = c(0.61, 0.99, 1.47, 1.52, 0.45,                
             3.34, 1.05, -1.47, 1.3, 0.33),                
       y = c(-0.69, 1.6, 0.44, 1, 0.88,                
             1.17, 2.4, 1.21, 0.87, 2.15))                
[1] 0.09522249
cohend(x = c(0.61, 0.99, 1.47, 1.52, 0.45,                
             3.34, 1.05, -1.47, 1.3, 0.33))                
[1] 0.796495

Also related to determining characteristics of the function call is the function match.call(). Whereas missing() determines whether a specific argument is missing, and match.arg() determines whether an argument matches one of the valid options, match.call() captures the entire function call. This might be easier to demonstrate than to explain. The following little function calculates the coefficient of variation, the sample standard deviation divided by the sample mean. It also captures and returns the function call by using match.call():

cv <- function(x, na.rm = FALSE) {                
  fcall <- match.call()                

  est <- sd(x, na.rm = na.rm) / mean(x, na.rm = na.rm)                

  return(list(CV = est, Call = fcall))                
}                

cv(1:5)                
$CV
[1] 0.5270463

$Call
cv(x = 1:5)

cv(1:8, na.rm = TRUE)                
$CV
[1] 0.5443311

$Call
cv(x = 1:8, na.rm = TRUE)

In the output from those two examples, match.call() captures exactly the call to the function, though it adds explicit argument names. This can sometimes be useful for keeping a record of exactly what the call was that created particular output. Perhaps the most common place where this is used is in the output from regression models in R. For example, the following code shows a linear model in which the output echoes the function call, which is done using match.call():

lm(mpg ∼ hp, data = mtcars)                

Call:
lm(formula = mpg ∼ hp, data = mtcars)

Coefficients:
(Intercept)           hp  
   30.09886     -0.06823

The return() function is typically used at the end of functions to return a specific object, as you have already seen in some of the previous examples, though it was not explicitly discussed. However, return() can also be used to return values from any point within a function, thus ending execution of the function. The following function has an if statement that, if true, results in early termination of the function. Notice that the final result, if not true, is not even wrapped in return(). R returns the last object in a function by default, so an explicit call to return() is not strictly necessary:

f <- function(x) {                
  if (x < 4) return("I'm done!")                

  paste(x, "- Fin!")                
}                

f(10)                
[1] "10 - Fin!"
f(3)                
[1] "I'm done!"

Even though you can use return()earlier in a function, this is discouraged, because it can be surprising to users and anyone else reading or debugging code. The same effect as in the preceding code can be accomplished by using flow control:

f <- function(x) {                
  if (x < 4) {                
    "I'm done!"                
  } else {                
    paste(x, "- Fin!")                
  }                
}                

f(10)                
[1] "10 - Fin!"
f(3)                
[1] "I'm done!"

In addition to not using return() midway in functions, some argue that an explicit call to return() should not be used at the end of functions either, as it is unnecessary. This remains a point of preference, as it can help draw attention to exactly what is returned at the end of a function.

Although not exclusively used in functions, the invisible() function is often used with the object returned by a function. Earlier we made a function that calculates the coefficient of variation and returns the function call. We can modify the function to print the coefficient of variation and invisibly return the rest, as shown in the following code. The use of invisible() means that even though the function returns the same object it did before, that object is not shown. The invisible() function is perhaps most often used in functions that are designed to create attractive output, such as calls to summary()or plotting functions. The function’s primary purpose is to show a summary or graph, but in case anyone wants or needs to edit the object, the actual object is invisibly returned and thus can be captured and saved for later use:

cv <- function(x, na.rm = FALSE) {                
  fcall <- match.call()                

  est <- sd(x, na.rm = na.rm) / mean(x, na.rm = na.rm)                

  print(est)                
  return(invisible(list(CV = est, Call = fcall)))                
}                

cv(1:8, na.rm = TRUE)                
[1] 0.5443311

res <- cv(1:8, na.rm = TRUE)                
[1] 0.5443311
res$Call                
cv(x = 1:8, na.rm = TRUE)

The on.exit() function can be used to guarantee that a certain set of commands is executed when the function exits or completes. Expressions in on.exit() do execute, even if the function has an error or does not properly complete as expected. An example is shown in the second use case of the little function in the following code. An error causes the function to terminate, and once that happens, the expression in on.exit() is executed. Using on.exit() is particularly valuable when a function modifies any values outside itself. For example, sometimes a plotting function modifies the default plot parameters and returns them to whatever their original state was on completion. Using on.exit() ensures that even if something goes wrong and the function fails or has an error, the user still has all of the original settings:

f <- function(x) {                
  on.exit(print("Game over"))                
  x + 5                
}                

f(3)                
[1] "Game over"
[1] 8

f("a")                
Error in x + 5 (from #3) : non-numeric argument to binary operator
[1] "Game over"

The last set of function-specific functions we cover is related to giving the software or user a signal. In order of severity, they are stop(), warning(), and message(). The next example is slightly more realistic and calculates the mean of a variable, either on its original scale or on a transformed scale, and then back-transforms the mean. The log scale can be relatively more resistant to outliers. If a log transformation is used, the example code checks whether the variable has any negative values, which are undefined and result in a full error, by using the stop() function . The argument passed to stop() is the error message to display. A similar process is followed for warning(), again with the message to be displayed in the warning. Warnings are different from errors. An error, via stop(), causes the function to stop being evaluated and terminate. A warning is issued at the end, but the function is allowed to continue its evaluation. Finally, message()can be used to send a message or signal to the user, but without indicating a real or likely problem, as a warning does. Although not demonstrated, a related convenience function is stopifnot(), which allows a logical expression to be passed and issues an error if the expression does not evaluate to a true value. Although this option has the benefit of saving some code, a disadvantage is that you cannot write a custom error message to indicate exactly what went wrong.

One of the reasons it is helpful to use warnings or messages, rather than just calling print() or cat() to have the function print a message, is that warnings and messages can be (optionally) suppressed by using the aptly named functions suppressWarnings()and suppressMessages(). All of these functions are demonstrated in the code immediately following:

f <- function(x, trans = c("identity", "log")) {                
  trans <- match.arg(trans)                

  if (trans == "log") {                
    if (any(x < 0)) stop("Log is not defined for negative values")                
    if (any(x < 1e-16)) warning("Some x values close or equal to zero, results may be unstable")                

    x <- log(x)                
    message("x successfully log transformed")                

    exp(mean(x))                
  } else {                
    mean(x)                
  }                
}                

f(c(1, 2, 100))                
[1] 34.33333

f(c(1, 2, 100), trans = "log")                
x successfully log transformed
[1] 5.848035

suppressMessages(f(c(1, 2, 100), trans = "log"))                
[1] 5.848035

f(c(0, 1, 2, 100), trans = "log")                
x successfully log transformed
[1] 0
Warning message:
In f(c(0, 1, 2, 100), trans = "log") :
  Some x values close or equal to zero, results may be unstable

suppressWarnings(f(c(0, 1, 2, 100), trans = "log"))                
x successfully log transformed
[1] 0

f(c(-1, 1, 2, 100), trans = "log")                
Error in f(c(-1, 1, 2, 100), trans = "log") (from chapter04.R!79246pw#5) :
  Log is not defined for negative valuesFunctionssuppressWarnings()FunctionssuppressMessages()

Debugging

The functions demonstrated so far have all worked or have had purposeful errors that are obvious to spot. Sometimes the process of finding the error, or debugging the code and functions written, takes longer. Fortunately, there are some tools to help the process and some practices to narrow the issues.

Although debugging can apply to any code, not just functions, functions can be particularly tricky to debug without additional tools, because normally all of their code executes without interruption or any chance to see what is happening along the way. In this section, we write a function that uses a formula to calculate the means of a variable by levels of another variable, using tapply(), which you previously examined in Chapter 3; the function then plots the raw data with dots for the means, using the following code , as shown in Figure 4-1:

Figure 4-1. Scatter plot with data in unfilled points, and means in large blue points, showing the incorrect results

meanPlot <- function(formula, d) {                
  v <- all.vars(formula)                
  m <- tapply(d[, v[1]], d[, v[2]],                
              FUN = mean, na.rm = TRUE)                

  plot(formula, data = d, type = "p")                
  points(x = unique(d[, v[2]]), y = m,                
         col = "blue", pch = 16, cex = 2)                
}                

meanPlot(mpg ∼ cyl, d = mtcars)

Something does not look correct about the means. The mean for the points when cyl = 8 looks okay, but the other two seem to fall at extremes. The debug() function allows us to debug a function and step through the lines as they are executed. To use it, we first call debug() on the function we want to debug:

debug(meanPlot)

Then when we use the original function, it triggers R to enter debugging. At the first step, R tells us the function call we are debugging in, and where:

meanPlot(mpg ∼ cyl, d = mtcars)              
debugging in: meanPlot(mpg ∼ cyl, d = mtcars)
debug at c:/Temp/chapter04.R!79246wk#1: {
    v <- all.vars(formula)
    m <- tapply(d[, v[1]], d[, v[2]], mean, na.rm = TRUE)
    plot(formula, data = d, type = "p")
    points(x = unique(d[, v[2]]), y = m, col = "blue", pch = 16,
        cex = 2)
}
Browse[2]>              
debug at #2: v <- all.vars(formula)
Browse[2]>              
debug at #3: m <- tapply(d[, v[1]], d[, v[2]], mean, na.rm = TRUE)
Browse[2]>              
debug at #5: plot(formula, data = d, type = "p")
Browse[2]> m              
       4        6        8
26.66364 19.74286 15.10000
Browse[2]> unique(d[, v[2]])              
[1] 6 4 8
Browse[2]> Q

Now that we know where the problem occurs, we can fix the function by first sorting the data, so that tapply() and unique()give the results in the same order. The code is shown here, and the result is in Figure 4-2:

Figure 4-2. Scatter plot with data in unfilled points, and means in large blue points, showing the corrected results

meanPlot <- function(formula, d) {                
  v <- all.vars(formula)                
  d <- d[order(d[, v[2]]), ] ## sorting first                
  m <- tapply(d[, v[1]], d[, v[2]],                
              FUN = mean, na.rm = TRUE)                

  plot(formula, data = d, type = "p")                
  points(x = unique(d[, v[2]]), y = m,                
         col = "blue", pch = 16, cex = 2)                
}                

meanPlot(mpg ∼ cyl, d = mtcars)

If you are debugging your function and do not want to step through each line of code, the browser() function can be inserted into the function code. Then, when the function reaches that point, a browser is invoked, and you can examine the current state of variables in the function’s local environment. In the following example code, we add a call to browser() in the function,examine the current objects available by using ls(), examine the contents of the object, v, and again type a capital Q to quit debugging:

meanPlot <- function(formula, d) {                
  v <- all.vars(formula)                
  d <- d[order(d[, v[2]]), ] ## sorting first                
  m <- tapply(d[, v[1]], d[, v[2]],                
              FUN = mean, na.rm = TRUE)                

  browser()                

  plot(formula, data = d, type = "p")                
  points(x = unique(d[, v[2]]), y = m,                
         col = "blue", pch = 16, cex = 2)                
}                

meanPlot(mpg ∼ cyl, d = mtcars)                
Called from: meanPlot(mpg ∼ cyl, d = mtcars)
Browse[1]>                
debug at #9: plot(formula, data = d, type = "p")
Browse[2]> ls()                
[1] "d"       "formula" "m"       "v"      
Browse[2]> v                
[1] "mpg" "cyl"
Browse[2]> Q

A useful function for debugging when working interactively is traceback(). For example, if you call a function such as lm() and then get an error message, it can sometimes be difficult to know exactly where or why that mistake occurred. The error is often not even directly from the function you called, as the functions you frequently use may in turn call many other functions internally. The following example shows how this is done by using traceback() immediately after the code that resulted in error. The output shows the call stack tracing the path from the initial call to the final code that generated the error. This can be useful information if you need to look at the code to determine why the error occurred:

lm(mpg ∼ jack, data = mtcars)                
Error in eval(expr, envir, enclos) : object 'jack' not found

traceback()                
7: eval(expr, envir, enclos)
6: eval(predvars, data, env)
5: model.frame.default(formula = mpg ∼ jack, data = mtcars, drop.unused.levels = TRUE)
4: stats::model.frame(formula = mpg ∼ jack, data = mtcars, drop.unused.levels = TRUE)
3: eval(expr, envir, enclos)
2: eval(mf, parent.frame())
1: lm(mpg ∼ jack, data = mtcars)

Finally, sometimes code bugs are not in your code, but in other code you are using, such as from another R package. Although it is rare to find bugs in recommended R packages, it is more frequent in the thousands of other R packages. Also, sometimes problems are not a bug per se, but a difference in how you want to use a function vs. how the original writer envisioned its use. Although the source code for all R packages on CRAN is publicly available for download and editing, it can be a hassle to download an entire package’s source code, edit, and reinstall, just to see if doing something slightly different in one function fixes the problem.

Consider the following challenge. Suppose you are using data that sometimes includes infinity for some reason, but you want to include only finite cases. The following code shows an example using the wtd.quantile() function from the Hmisc package :

wtd.quantile(c(1, 2, 3, Inf, NA),              
          weights = c(.6, .9, .4, .2, .6))              
  0%  25%  50%  75% 100%              
 NaN  Inf  Inf  Inf  Inf

If we look at the code for wtd.quantile(), by typing it into the R console without parentheses, we can see that another function does the main calculations, wtd.table(), as shown here:

wtd.quantile              
function (x, weights = NULL, probs = c(0, 0.25, 0.5, 0.75, 1),
    type = c("quantile", "(i-1)/(n-1)", "i/(n+1)", "i/n"), normwt = FALSE,
    na.rm = TRUE)
{
    if (!length(weights))
        return(quantile(x, probs = probs, na.rm = na.rm))
    type <- match.arg(type)
    if (any(probs < 0 | probs > 1))
        stop("Probabilities must be between 0 and 1 inclusive")
    nams <- paste(format(round(probs * 100, if (length(probs) >
        1) 2 - log10(diff(range(probs))) else 2)), "%", sep = "")
    if (type == "quantile") {
        w <- wtd.table(x, weights, na.rm = na.rm, normwt = normwt,
            type = "list")
        x <- w$x
        wts <- w$sum.of.weights
        n <- sum(wts)
        order <- 1 + (n - 1) * probs
        low <- pmax(floor(order), 1)
        high <- pmin(low + 1, n)
        order <- order%%1
        allq <- approx(cumsum(wts), x, xout = c(low, high), method = "constant",
            f = 1, rule = 2)$y
        k <- length(probs)
        quantiles <- (1 - order) * allq[1:k] + order * allq[-(1:k)]
        names(quantiles) <- nams
        return(quantiles)
    }
    w <- wtd.Ecdf(x, weights, na.rm = na.rm, type = type, normwt = normwt)
    structure(approx(w$ecdf, w$x, xout = probs, rule = 2)$y,
        names = nams)
}
<environment: namespace:Hmisc>

Using the same approach, we can examine the wtd.table() function . When doing so, we see that although it can automatically remove missing values, it has no check or way to remove nonfinite values. It may seem easier just to change your data, but sometimes data is generated automatically and passed on, so that it is simpler to change a function than it is to modify the data. We can readily copy and paste the code for wtd.table()into our R editor and revise it, as shown here:

revised.wtd.table <- function (x, weights = NULL, type = c("list", "table"), normwt = FALSE,              
    na.rm = TRUE)              
{              
    type <- match.arg(type)              
    if (!length(weights))              
        weights <- rep(1, length(x))              
    isdate <- testDateTime(x)              
    ax <- attributes(x)              
    ax$names <- NULL              
    if (is.character(x))              
        x <- as.factor(x)              
    lev <- levels(x)              
    x <- unclass(x)              
    if (na.rm) {              
        s <- !is.na(x + weights) & is.finite(x + weights)              
        x <- x[s, drop = FALSE]              
        weights <- weights[s]              
    }              
    n <- length(x)              
    if (normwt)              
        weights <- weights * length(x)/sum(weights)              
    i <- order(x)              
    x <- x[i]              
    weights <- weights[i]              
    if (anyDuplicated(x)) {              
        weights <- tapply(weights, x, sum)              
        if (length(lev)) {              
            levused <- lev[sort(unique(x))]              
            if ((length(weights) > length(levused)) && any(is.na(weights)))              
                weights <- weights[!is.na(weights)]              
            if (length(weights) != length(levused))              
                stop("program logic error")              
            names(weights) <- levused              
        }              
        if (!length(names(weights)))              
            stop("program logic error")              
        if (type == "table")              
            return(weights)              
        x <- all.is.numeric(names(weights), "vector")              
        if (isdate)              
            attributes(x) <- c(attributes(x), ax)              
        names(weights) <- NULL              
        return(list(x = x, sum.of.weights = weights))              
    }              
    xx <- x              
    if (isdate)              
        attributes(xx) <- c(attributes(xx), ax)              
    if (type == "list")              
        list(x = if (length(lev)) lev[x] else xx, sum.of.weights = weights)              
    else {              
        names(weights) <- if (length(lev))              
            lev[x]              
        else xx              
        weights              
    }              
}

Unlike the original wtd.table() function , the revised function works exactly as we want:

wtd.table(c(1, 2, 3, Inf, NA),                
          weights = c(.6, .9, .4, .2, .6))                
$x
[1]   1   2   3 Inf

$sum.of.weights
[1] 0.6 0.9 0.4 0.2

revised.wtd.table(c(1, 2, 3, Inf, NA),                
          weights = c(.6, .9, .4, .2, .6))                
$x
[1] 1 2 3

$sum.of.weights
[1] 0.6 0.9 0.4

However, the challenge is that the wtd.quantile() function and other Hmisc functions that use wtd.table() still do not work as we hope, because they do not use our revised function. Assigning our function to the name wtd.table() in the global environment is not sufficient, because scoping rules mean that Hmisc functions access the wtd.table() function first from the Hmisc environment, not from our global environment. It is like having a file of the same name on your computer, but in two different folders. For all the Hmisc functions that utilize wtd.table() to use our revised function, we need not only to name it correctly, but also to put it in the correct place. We can assign an object to a specific namespace by using the assignInNamespace() function :

assignInNamespace(x = "wtd.table",                
                  value = revised.wtd.table,                
                  ns = "Hmisc")                

wtd.quantile(c(1, 2, 3, Inf, NA),                
          weights = c(.6, .9, .4, .2, .6))                
   0%   25%   50%   75%  100%
2.000 2.225 2.450 2.675 2.900

wtd.Ecdf(c(1, 2, 3, Inf, NA),                
          weights = c(.6, .9, .4, .2, .6))                
$x
[1] 1 1 2 3

$ecdf
[1] 0.0000000 0.3157895 0.7894737 1.0000000

Although the assignInNamespace() function has no output, we can see that afterward the wtd.quantile() function and others that depend on wtd.table(), such as wtd.Ecdf(), are now working as we hoped. A caveat about assignInNamespace() is that you cannot assign any object to an object that does not already exist in that namespace (that is, you can overwrite only existing objects). You also are not allowed to use assignInNamespace() in any packages you may want to submit to CRAN. Even if it is sometimes the convenient approach, it would become confusing if people did this as a general rule. Were this to happen, the definition of functions in package A would depend on whether you had loaded package B, because package B could overwrite (not just mask) the function definition in package A. Note that any changes you make in this way are temporary and vanish when you restart R. However, this can be a helpful technique for debugging an existing package, as it is a relatively easy way to make sure that the issue encountered using wtd.quantile() was indeed “fixed” by the suggested change to wtd.table(). At this point, if it were truly a bug or even if it was just a desirable feature, you could e-mail the package maintainer to suggest the change, confident that the suggested code works. To see the current maintainer, you can just type maintainer("Hmisc"), or whatever the package of interest is called.

Summary

We covered a lot of functions and a lot about functions in this chapter! In case you need to refresh your memory about any specific functions, Table 4-1 lists the key functions introduced in this chapter and provides a brief description of each. Of course, you can look up more in the official help files.

Table 4-1. Key Functions Described in This Chapter

Function	What It Does
formals()	Allows you to see the formal arguments of a function you build.
args()	Shows default values and names for a created function’s arguments.
body()	Shows the body of a function (the code between { and } ).
environment()	Shows the environment your function lives in (often the global environment, so far).
search()	Shows the search order for functions. Remember, people may have already used your function name!
parent.env()	Takes an environment and tracks it up one level.
match.arg()	Allows for fuzzy matching of function arguments.
missing()	Tests whether a value was passed to the function. Note that this will be false after match.arg().
match.call()	Captures the entire function call; used often in regression.
return()	Not required, possibly contentious, and for sure, if used be the last part of a new function.
invisible()	Suppresses output.
on.exit()	Regardless of a successful or failed function attempt, this executes its argument.
stop(), stopifnot(), warning(), message()	These are errors of various levels of severity, from full-on stop and an error, to a milder warning, to a mostly polite message.
suppressWarnings(), suppressMessages()	These two do precisely what they say. Once you’re familiar with a function, warnings or messages may be tedious or safely ignored.
debug()	Use this to start debugging a function that is not working properly.
browser()	This goes into a function body, right before the part you want to debug.
traceback()	Provides a list of expressions leading to the source of an error.
assignInNamespace()	Allows for the temporary replacement of a function with a locally crafted function.
maintainer()	Called on a package, shows a contact name and e-mail to go to for troubleshooting.

Although the formal names of various parts of a function are not necessarily critical, learning how to use and write functions efficiently may be one of the best investments in learning R you ever make. Writing functions provides a way out of writing repetitive code. There are no strict rules, but if you find yourself doing the same task often, there is a good chance it is worth writing a function to do that. It might be a big task involving many pieces, or it might be a small task. For example, in psychology, it is common to report values at “high” and “low” values of a continuous variable, which are often defined as mean +/– 1 standard deviation. This is easy to do in R by using the mean() and sd() functions. If you do it a lot, it may be worth writing a short, one-line function so that rather than type mean(x) + sd(x), you just type msd(x), or whatever you call your function. If you do not feel comfortable playing with functions and writing your own, it would be a good idea to get some more practice before moving on to Chapters 5 and 6, where we assume you are comfortable with functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Writing Functions

Create new playlist

Sign In

Sign Up