CHAPTER 5

image

Functional Programming

The longer one programs, the easier it becomes to think like a programmer. You learn that the best way to solve a problem is to solve it once in such a way that the adjustments you need to make when the problem changes slightly are very small ones. It is better to use variables and even other functions in your code so that you can change a single value once rather than many times. This is the essence of the pragmatic programmer who writes with purpose. Programmers who come to R from other languages such as C++ or Python tend to think in loops. You are probably convinced by now that R’s vectorization allows us to avoid loops in many situations. As you saw in Chapter 4, looping is possible when it is needed. Efficient code allows us to automate as many tasks as we can so that we don’t repeat ourselves, and to avoid looping as much as possible.

Soon after I started using R, I quickly decided I needed to be more systematic in my learning, and I bought a couple of books on R. The R documentation itself is adequate, but was often confusing and difficult to read on a computer screen. What I soon found was that R functions are also objects in their own right. As a general rule, we can say that every function has three components. These are the body of the function (the code inside the function), the formals (or arguments) that control how the function is called, and the environment, which you might think of as the location indicator of the function’s variables.

One of the nicest things about R is its transparency. You can literally view any function simply by typing its name without any arguments. Here is the mad() function from the base version of R. You see that its environment is the stats package that is part of the base R implementation. When no environment is printed, this means the function was created in the global environment, that is, the current R session:

> mad
function (x, center = median (x), constant = 1.4826 , na.rm = FALSE ,
low = FALSE , high = FALSE )
{
if (na.rm)
x <- x[!is.na(x)]
n <- length (x)
constant * if (( low || high ) && n%%2 == 0) {
if ( low && high )
stop ("'low ' and 'high ' cannot be both TRUE ")
n2 <- n%/%2 + as. integer ( high )
sort ( abs (x - center ), partial = n2)[n2]
}
else median ( abs (x - center ))
}
<bytecode : 0 x00000000077b8458 >
< environment : namespace :stats >

Functions usually have names, but we can also use anonymous, or unnamed functions. We might use an anonymous function when there’s no advantage to giving the function a name. For example, we could define a function to calculate the coefficient of variation, which is the ratio of the standard deviation to the mean. Say that we have no particular recurring use for this statistic but needed it for the displacement, gross horsepower, and rear axle ratio of the cars in our mtcars data. We could write an anonymous function as follows:

> sapply (mtcars [, 3:5], function (x) sd(x)/ mean (x))
     disp        hp      drat
0.5371779 0.4674077 0.1486638

Just as all R functions do, an anonymous function has formals, a body, and an environment:

> formals(function(x) sd(x)/mean(x))
$x
> body(function(x) sd(x)/mean(x))
sd(x)/mean (x)
> environment (function(x) sd(x)/mean(x))
< environment: R_GlobalEnv>

5.1 Scoping Rules

In R, scoping describes how R “looks up” the value of a symbol. If an “unknown” name is not defined in a function, R will look one level up, and keep doing so all the way up to the global environment. The same is true of functions. R will keep looking up from the current level until it gets to the global environment, and then begin looking in any packages loaded in the current workspace until it reaches the empty environment. Once the empty environment is reached, if R cannot find the value for a given symbol, an error is produced. This kind of scoping is one of the ways in which R is different from the original S language. R uses what is known as static or lexical scoping. A free variable is not a formal argument or a local variable, that is, assigned within the function body. Lexical scoping means that the values of a free variable are searched for within the environment in which the function was defined and then in the parent environment.

When we try to bind a value to a symbol in R, R searches through the series of environments, as described earlier. The search list can be found by use of the search function. The global environment (that is, the user’s workspace) is always the first element of the search list (although a different environment may be first when the searching is done within a function within a package, rather than by the user). The base packages is always the last element of the search list, as shown below. The order of the packages in the search list is important because users are able to configure which packages are loaded at startup. When you load a new package with the library() or require() functions, that package is placed in the second position in the search list, and everything else moves down the list:

> search()
[1] ".GlobalEnv"         "tools:rstudio"        "package:stats"    "package:graphics"  "
    package:grDevices"  "package:utils "
[7] "package:datasets"   "package : methods "   "Autoloads"        "package:base"
> library(swirl)
> search ()
[1] " GlobalEnv"        "package:swirl"         "tools:rstudio"    "package:stats"    "
    package:graphics"  "package:grDevices"
[7] "package:utils"     "package:datasets"      "package:methods"  "Autoloads"        "
    package:base"

Typically, we define functions within the global environment, so the values of free variables are located in the user’s workspace. But in R, it is also possible to define a function within the body of another function, and to create functions that themselves create additional functions. The lexical scoping provided by R makes statistical computing easier. We call functions written by other functions “closures.” If you will, they enclose, or encapsulate the environment of the parent function and can access its variables. This gives us the ability to have a parent level controlling operation, and a child function in which the actual work is done. Here, for example, is a function called take.root that defines another function, root. By assigning different values to n, we can take different roots, such as a square root or a cube root:

> take.root <- function(n) {
+   root <- function(x) {
+     x ^(1/n)
+   }
+   root
+ }
> square.root <- take.root(2)
> cube.root <- take.root(3)
> square.root(81)
[1] 9
> cube.root(27)
[1] 3
> ls(environment(square.root))
[1] "n"    "root"
> get ("n", environment(square.root))
[1] 2
> ls(environment(cube.root))
[1] "n"    "root"
> get("n", environment(cube.root))
[1] 3

5.2 Reserved Names and Syntactically Correct Names

The following words are reserved in R:  if, else, repeat, while, function, for, in, next, break, TRUE, FALSE, NULL, Inf, NaN, NA, NA_integer_, NA_real_, NA_complex_, and NA_character_. In R, syntactically valid names consist of letters, numbers, and the dot or underline characters. Names must start with a letter or with a dot, which cannot be followed immediately by a number. The use of dots in function names could be for several different purposes. For example, visual separation can be accomplished by the use of dots or underscores, as in data.frame or is.na. Underlines can be used for the same purpose. Another use of the dot is to identify underlying functions of generic methods. Finally, we can “hide” internal functions or objects by beginning the name with a dot. This is only a partial obscurity, because we can ask for all names to be shown, as the following code shows. See that our object .y as well as the .getSymbols and .Random.seed functions are also “in” the R workspace but visible only when we ask to see everything:

> x <- 10
> .y <- 20
> ls ()
[1] "x"
> ls(all.names = TRUE)
[1] ".getSymbols"  ".Random.seed"  ".y"              "x"

5.3 Functions and Arguments

We create functions and store them as R objects. We must tell R we are creating a function by using the function() directive. In R, a function is a “first class object,” meaning that it can be passed as an argument to another function, and functions can be nested (defined inside another function, as discussed earlier). Functions have arguments (also called parameters), and these can potentially have default values. Some functions do not have arguments at all, as in the BMI function we used in Chapter 3.

R matches the arguments in functions either by position, or by name. In particular, function arguments are matched in the following order:

  1. check for an exact match for a named argument
  2. check for a partial match
  3. check for a positional match

R also uses “lazy” evaluation, which means that an argument is evaluated only when it is needed. For example, we can create a function with two arguments, one of which is not used, because the first argument is matched positionally. The following example illustrates. No error is produced, because the 10 matched x positionally. Note that supplying a value for y makes no difference, either, as it is not needed by the function. Our function does not set a default for x, so there can be some damaged ways to call the function. Note that in the last call to our function, we explicitly name x in the ‘wrong’ position. These examples, of course, are not meant to encourage bad programming practice but simply to illustrate the way R works. We will hope to produce effective and efficient functions rather than ones that capitalize on R’s quirks:

> myFun <- function (x,y) {
+   print (x ^2)
+ }
> myFun (10)
[1] 100
> myFun (10 ,20)
[1] 100
> myFun(,10)
Error in print(x^2) : argument "x" is missing, with no default
> myFun(20,x=10)
[1] 100

5.4 Some Example Functions

In the following sections, I will show you a couple of examples of functions that I have written just for fun. The first is a function like the BMI function that we used earlier that queries for user input using the readline function, making the function interactive. As with the BMI function, it does not require arguments. The second is one that requires arguments, and we will examine in more detail how the arguments are evaluated.

5.4.1 Guess the Number

Here’s a problem similar to one used in many programming classes and books. The computer “thinks” of a number, and the user guesses until he or she either gets the number right, or runs out of tries. Although R may not be the best language for writing such a function, it is possible, and we can see at work in the function many of the things we have talked about. We will use the uniform distribution to pick a number between 1 and 100, and then let the user determine how many guesses he or she wants. The function has no arguments, instead querying the user for a new guess if the number is either too high or too low. If the person guess the number, R reports that fact and tells the user how many tries it took. If the person does not guess the number, R tells the user he or she is out of tries and then reveals the number it was ”thinking” about. We use the while loop rather than the for loop, because in this case we are not iterating through a vector, per se. It would of course be possible to rewrite this with a for loop based on the number of attempts. Note that the break statements halt the execution of the while loop when the person either guesses the number correctly or runs out of turns:

guessIt <- function(){
cat ("I am thinking of a number between 1 and 100"," ")
computerPicks <- as.integer(round(runif(1,1,100),0))
attempts <- as.integer(readline("How many guesses do you want? "))
count = 0
while (count < attempts){
count <- count + 1
  userGuess <- as.integer(readline("Enter your guess: "))
  if (count == attempts && userGuess != computerPicks) {
    cat("Sorry, out of tries. My number was ",computerPicks," ")
    break
    }
  if (userGuess == computerPicks) {
    cat("You got it in ", count, "tries."," ")
   break
   }
  if (userGuess < computerPicks ) {
    cat("Your guess is too low."," ")
    }
  if (userGuess > computerPicks){
    cat ("Your guess is too high."," ")
    }
  }
}

Here’s one of my attempts at guessing the correct number. I used the “splitting the difference” strategy, but there’s a bit of luck involved as well. You can adjust the function in various ways to make it more interesting, for example by letting the user pick the lower and upper bounds. I set the counter to zero initially so that when R increments it, I get the correct number for the number of tries:

> guessIt ()
I am thinking of a number between 1 and 100
How many guesses do you want? 7
Enter your guess: 50
Your guess is too high.
Enter your guess: 25
Your guess is too high.
Enter your guess: 13
Your guess is too low.
Enter your guess: 17
Your guess is too low.
Enter your guess: 22
You got it in 5 tries.

5.4.2 A Function with Arguments

Many students learn the general quadratic formula in their algebra classes. Compared to other approaches to solving quadratic equations, the general formula has the advantage that it always works. As a reminder, here is the general quadratic formula:

Eqn5-1.jpg

The discriminant of a quadratic equation is the expression under the radical. If the discriminant is positive, the equation will have two real roots. If the discriminant is zero, the equation will have one (repeated) real root, and if the discriminant is negative, the equation will have two complex roots. Assume that we are interested only in real roots. Let’s write a function to find the real root(s) of a quadratic equation. We will then test the function with different coefficients for a, b, and c to make sure that it works correctly:

> # function for finding the real root(s) of a quadratic equation
> quadratic <- function (a, b, c) {
+   discrim <- b^2 - 4*a*c
+   cat("The discriminant is: ", discrim, " ")
+   if(discrim < 0){
+     cat("There are no real roots. "," ")}else {
+   root1 <- (-b+ sqrt ( discrim )) / (2*a)
+   root2 <- (-b- sqrt ( discrim )) / (2*a)
+   cat("root1: ",  root1,   " ")
+   cat("root2: ",  root2,   " ")
+     }
+ }
> quadratic (2, -1, -8)
The discriminant is: 65
root1:  2.265564
root2:  -1.765564
> quadratic (1, -2, 1)
The discriminant is: 0
root1:  1
root2:  1
> quadratic (3, 2, 1)
The discriminant is: -8There are no real roots.

5.5 Classes and Methods

R supports various classes and methods. In particular, there are now three object-oriented systems that work in R: the S3 class, the S4 class, and the newer Reference Classes, called refclasses (previously talked about as R5 and also R6, an R package that implements simplified reference classes), which do not depend on S4 classes and the methods package. In this book, we will focus on some examples of S3 classes and methods. S4 classes and refclasses are more formal and typically only make sense in the context of larger programming projects, such as when developing an R package.

5.5.1 S3 Class and Method Example

To create an S3 class, we first form a list, and then we set the class attribute by using the class() or the attr() function. Say we are building a list of the donors to one of our favorite charities. Our list will include the person’s name, gender, and the amount last donated.

First, our S3 class is as shown. We create the list, set the class attribute, and show the results by typing the object name:

> info <- list(name = "Jon", gender = "male", donation = 100)
> class(info) <- "member"
> attributes(info)
$ names
[1]  "name"    "gender"    "donation"

$ class
[1]  "member"

> print ( info )
$name
[1] "Jon"

$gender
[1] "male"

$donation
[1] 100

attr(,"class")
[1] "member"

When we have R print the object, it shows each the elements, data, and also reports the attributes, here the class we defined. It is not very pretty, because R does not have any special methods defined for the print() function to deal with an object of class member. However, using S3 classes and methods, it is easy to create a specific method for a generic function like print(). The generic way to define a method for a function in S3 classes is function.class(). In this case, our function is print() and our class is member, so we call the function, print.member() and define that. This gives us prettier results than before.

> print.member <- function(person) {
+   cat("Name: ", person $name , " ")
+   cat("Gender: ", person $ gender, " ")
+   cat("Donation: ", person $ donation, " ")
+ }
> print ( info )
Name:   Jon
Gender:   male
Donation:   100

5.5.2 S3 Methods for Existing Classes

We can also write new S3 methods for existing classes. For example, again using the built in mtcars data, suppose we conducted an independent samples t-test comparing the miles per gallon for cars with manual versus automatic transmissions. We’ll do more with t-tests later, for now, focus on the creation of the new method below:

> results <- t.test(mpg ~ am, data = mtcars)

> results

        Welch Two Sample t-test

data:  mpg by am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.280194  -3.209684
sample estimates:
mean in group 0 mean in group 1
       17.14737        24.39231

R gives us nice text output, but what if we wanted to plot the results? Calling plot() on the t-test object does not give useful results.

> plot( results )
Error in xy.coords(x, y, xlabel, ylabel, log) :
  'x' is a list, but does not have components 'x' and 'y'

plot() does not work because there are no specific methods defined. To know this, first check the class of the t-test output, and then we can see if any plot() methods are defined for that class using the methods() function.

> class ( results )
[1] "htest"
> methods( plot )
 [1] plot.acf*            plot.correspondence* plot.data.frame*
 [4] plot.decomposed.ts*  plot.default         plot.dendrogram*
 [7] plot.density*        plot.ecdf            plot.factor*
[10] plot.formula*        plot.function        plot.ggplot*
[13] plot.gtable*         plot.hclust*         plot.histogram*
[16] plot.HoltWinters*    plot.isoreg*         plot.lda*
[19] plot.lm*             plot.mca*            plot.medpolish*
[22] plot.mlm*            plot.ppr*            plot.prcomp*
[25] plot.princomp*       plot.profile*        plot.profile.nls*
[28] plot.ridgelm*        plot.spec*           plot.stepfun
[31] plot.stl*            plot.table*          plot.ts
[34] plot.tskernel*       plot.TukeyHSD*

   Non-visible functions are asterisked

Since there is no plot.htest() function, no method is defined. Let’s define a simple plot method for a t-test now. The plot method is designed to take an object, the results from a t-test, and has a second argument to control how many digits the p-value should be rounded to. We set a default value of 4, so that it will be rounded to four decimals if another value is not explicitly specified. Now we can again call plot()on our t-test results object, and this time we get a nice figure. Notice that even though we called our function plot.htest() we only have to call plot(), methods dispatching means that R looks for a specific version of plot that matches the class of the first argument, in our case, class htest with output shown in Figure 5-1. Using methods can be an easy way to write functions that help extend the functionality already available in R and from the numerous R packages available to fit your individual needs and workflow.

> plot.htest <- function(object, digits.to.round = 4) {
+
+   rounded.pvalue <- round(object$p.value, digits.to.round)
+
+   barplot(object$estimate,
+           ylim = c(0, max(object$estimate) * 1.1),
+           main = paste(object$method, "of", object$data.name),
+           sub = paste("p =", rounded.pvalue))
+ }
>
> plot( results )

9781484203743_Fig05-01.jpg

Figure 5-1. S3 method plot rewrite graph

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.135.36