In many cases, the performance of the R code can be greatly improved by simple restructuring of the code; this doesn't change the output of the program, just the way it is represented. Restructurings of this type are often referred to as code refactoring. The refactorings that really make a difference performance-wise usually have to do with either improved allocation of memory or vectorization.
Refer all the way back to Chapter 5, Using Data to Reason About the World. Remember when we created a mock population of women's heights in the US, and we repeatedly took 10,000 samples of 40 from it to demonstrate the sampling distribution of the sample means? In a code comment, I mentioned in passing that the snippet numeric(10000)
created an empty vector of 10,000 elements, but I never explained why we did that. Why didn't we just create a vector of 1, and continually tack on each new sample mean to the end of it as follows:
set.seed(1) all.us.women <- rnorm(10000, mean=65, sd=3.5) means.of.our.samples.bad <- c(1) # I'm increasing the number of # samples to 30,000 to prove a point for(i in 1:30000){ a.sample <- sample(all.us.women, 40) means.of.our.samples.bad[i] <- mean(a.sample) }
It turns out that R stores vectors in contiguous addresses in your computer's memory. This means that every time a new sample mean gets tacked on to the end of means.of.our.samples.bad
, R has to make sure that the next memory block is free. If it is not, R has to find a contiguous section of memory than can fit all the elements, copy the vector over (element by element), and free the memory in the original location. In contrast, when we created an empty vector of the appropriate number of elements, R only had to find a memory location with the requisite number of free contiguous addresses once.
Let's see just what kind of difference this makes in practice. We will use the system.time
function to time the execution time of both the approaches:
means.of.our.samples.bad <- c(1) system.time( for(i in 1:30000){ a.sample <- sample(all.us.women, 40) means.of.our.samples.bad[i] <- mean(a.sample) } ) means.of.our.samples.good <- numeric(30000) system.time( for(i in 1:30000){ a.sample <- sample(all.us.women, 40) means.of.our.samples[i] <- mean(a.sample) } ) ------------------------------------- user system elapsed 2.024 0.431 2.465 user system elapsed 0.678 0.004 0.684
Although an elapsed time saving of less than one/two seconds doesn't seem like a big deal, (a) it adds up, and (b) the difference gets more and more dramatic as the number of elements in the vector increase.
By the way, this preallocation business applies to matrices, too.
Were you wondering why R is so adamant about keeping the elements of vectors in adjoining memory locations? Well, if R didn't, then traversing a vector (like when you apply a function to each element) would require hunting around the memory space for the right elements in different locations. Having the elements all in a row gives us an enormous advantage, performance-wise.
To fully exploit this vector representation, it helps to use vectorized functions—which we were first introduced to in Chapter 1, RefresheR. These vectorized functions call optimized/blazingly-fast C code to operate on vectors instead of on the comparatively slower R code. For example, let's say we wanted to square each height in the all.us.women
vector. One way would be to use a for-loop to square each element as follows:
system.time( for(i in 1:length(all.us.women)) all.us.women[i] ^ 2 ) -------------------------- user system elapsed 0.003 0.000 0.003
Okay, not bad at all. Now what if we applied a lambda squaring function to each element using sapply
?
system.time( sapply(all.us.women, function(x) x^2) ) ----------------------- user system elapsed 0.006 0.000 0.006
Okay, that's worse. But we can use a function that's like sapply and which allows us to specify the type of return value in exchange for a faster processing speed:
> system.time( + vapply(all.us.women, function(x) x^2, numeric(1)) + ) ------------------------- user system elapsed 0.006 0.000 0.005
Still not great. Finally, what if we just square the entire vector?
system.time( all.us.women ^ 2 ) ---------------------- user system elapsed 0 0 0
This was so fast that system.time
didn't have the resolution to detect any processing time at all. Further, this way of writing the squaring functionality was by far the easiest to read.
The moral of the story is to use vectorized options whenever you can. All of core R's arithmetic operators (+
, -
, ^
, sqrt
, log
, and so on) are of this type. Additionally, using the rowSums
and colSums
functions on matrices is faster than apply(A_MATRIX, 1, sum)
and apply(A_MATRIX, 1, sum)
respectively, for much the same reason.
Speaking of matrices, before we move on, you should know that certain matrix operations are blazingly fast in R, because the routines are implemented in compiled C and/or Fortran code. If you don't believe me, try writing and testing the performance of OLS regression without using matrix multiplication.
If you have the linear algebra know-how, and have the option to rewrite a computation that you need to perform using matrix operations, you should definitely try it out.
18.119.143.17