What You Will Learn In This Chapter:
Because R is a programming language, you have great flexibility in the approach you can take to running it. When you first begin to use R you will probably type commands directly from the keyboard. Later, as you become more confident, you will likely use snippets of commands stored in other areas, like a text file. The next step is to create simple functions that carry out something useful; you can call up these functions time and time again, and can save a lot of typing and effort. As your confidence and ability grow you will move on to creating larger scripts, that is, sets of R commands stored in a file that you can execute at any time.
Scripts can be especially useful because they enable you to prepare complex or repetitive tasks, which you can bring into operation at any time. Indeed, R is built along these lines, and you can think of the program as a bundle of scripts; by making your own you are simply increasing the usefulness of R and bending it to meet your own specific requirements.
Programming R is a wide subject in its own right. This chapter introduces you to the basic ideas so that you can set off on your own journey of discovery. Of course, you have gained a lot of experience at using R up to this point, so the step up to creating your own programs is only a small one.
R is very flexible, and because it accepts plain text to drive the commands, you can store useful snippets to use at a later date. You can copy and paste the text from a word processor (or other program) into R and either run the command “as is” or edit the command before you press Enter.
As you learn how to use R it is a good idea to keep a plain text file as a “notepad” (on Windows computers the Notepad.exe program is a good choice for this task). You can use this text file to keep notes and examples of R commands. However, a plain list of commands without explanation is not helpful. You can, of course, add explanatory notes in some way as you go along; the following example might be part of your notes file:
Code to work out means of columns in a data frame:
----------------------------------
apply(data, 2, mean, na.rm = TRUE)
----------------------------------
data = name of data frame
2 = columns (1 for rows)
mean = the mean command (can use others)
na.rm = TRUE = remove NA items if appropriate
To use this you can simply copy the text to the clipboard and paste it into the R console window; in the example here you want the line of text that is between the dashed lines. You can edit the name of the data and add a name for the result as you like. This is a good way to build up a library of commands that you can use and become familiar with.
When you copy command lines from R, you inevitably copy the > character that forms the “command entry point” as well. This book has used this approach so that you can see which lines were typed from the keyboard and which lines are results (that is, generated by R itself). So, if you copy commands into a text file it is a good idea to edit out the > characters at the beginning of command lines; keeping them in will give errors as you can see in the following example:
> > apply(data, 2, mean, na.rm = TRUE)
Error: unexpected '>' in ">"
As you look through R help entries, you will see lines of explanation in the examples (not always very clear, perhaps) that are associated with the # character, also known as the hash or pound character. R essentially ignores anything that follows a # character so the best use of this character is to keep notes. For instance, in your text file you can use the # character to create annotations that help you remember what is going on. The following example shows some commands that you might use to make yourself a basic bar chart with error bars (using standard error) and used the # character to organize the information:
# Barplot with se error bars.
# make copy of data called dat.
mn=apply(dat,2,mean,na.rm = TRUE) # set the mean values.
stdev=apply(dat,2,sd,na.rm = TRUE) # make the std deviation.
tot=apply(dat,2,sum,na.rm = TRUE) # get the sum for each column.
n=mn/tot # work out the no. observations (length does not accept na.rm=T).
se=stdev/sqrt(n) # calculate std err.
mx=round(max(mn+se)+0.5,0) # largest value to set y-axis.
bp=barplot(mn, ylim = c(0,mx)) # make plot and set y-axis to max value.
arrows(bp,mn+se,bp,mn-se,length=0.1,angle=90,code=3) # add error bars.
# If y-axis still too short change mx value to a larger one.
# END
When you use R, you’ll realize that there are a lot of commands that you can use! In spite of this, on some occasions it would be useful to have others, especially to carry out some tasks that you might require reasonably often. You can use the function() command to create new commands that you can then store and use again later. The general form of the command is like so:
function(args) expr
Inside the parentheses you type the arguments that you require for the function to work; after the parentheses you type the expression you require using the arguments you have provided.
The following example shows a simple one-line function that you can create yourself:
> log2 = function(x) log(x, base = 2)
Here you create an object called log2; the function has only one simple argument, which you call x. After the function() part you type the actual expression you want to evaluate; in this case you use the log() command using base 2. When you use your function, you type its name and give appropriate instructions inside the parentheses. In this case you require numeric input, and you see the result of the function when you type a value into the new command:
> log2(64)
[1] 6
> log2(seq(2,8,2))
[1] 1.000000 2.000000 2.584963 3.000000
> log2(c(2,4,8,16))
[1] 1 2 3 4
The object you created as part of your new function resides in the computer memory and can be listed like other objects. You can also save your function along with the workspace when you quit R or as part of a save.image() command. Your function object is bundled and encoded along with the other objects, but this is no problem because you can retrieve the function object at any time and edit it as you require. When you use save() to save individual R objects, they require a filename with a file extension. Data items usually have an .Rdata extension and functions that you create usually get a simple .R file extension. This enables you to differentiate between the two types of objects because .Rdata items are encoded and can be opened only by R, whereas .R items are usually plain text and can be read, and perhaps edited, by other programs.
When you create a function with the function(), command you give the various arguments as part of the command; you can also specify default values using = and the default value. The following example uses a fairly simple mathematical equation to determine the flow of water in a stream (the Manning equation); you have three arguments and one of them has a default:
manning = function(radius, gradient, coef=0.1125) (radius^(2/3)*gradient^0.5/coef)
The three arguments are radius, gradient, and coef (the Manning coefficient). You set the coef argument to have a default value of 0.1125. If you do not specify a coef when you run the function, the following value will be used:
> manning(radius = 1, gradient = 1/500)
[1] 0.3975232
You can use abbreviations when you run your new command as long as they are unambiguous (here the arguments have completely different names):
> manning(gra = 1/500, ra = 1)
[1] 0.3975232
You can even omit the names completely, but in that case you must specify the values for the arguments in exactly the correct order:
> manning(1, 1/500)
[1] 0.3975232
Of course, you can override the default values for any of the set arguments like so:
> manning(radius = 1, gradient = 1/500, coef = c(0.08, 0.11, 0.2))
[1] 0.5590170 0.4065578 0.2236068
Here you give three values for your coef argument and obtain three results.
A one-line script is very useful, but most of the time you will need longer and potentially more complex functions, which require several lines of commands. In that case you need a way to stop R from evaluating the function before you have finished typing it. One option would be to type the commands into a text editor and then copy and paste them into R. Another solution is to use curly brackets ({}). You use these brackets to create subsections of commands so you can use them to define the lines that form your function. The following example shows a simple function that determines the running median of a numeric vector:
> cummedian = function(x) {
+ tmp = seq_along(x)
+ for(i in 1:length(tmp)) tmp[i] = median(x[1:i])
+ print(tmp)
+ }
The first line starts the function by assigning it a name and listing the arguments; here there is only one argument, x. Rather than use any expressions at this point, you simply type a { and press the Enter key.
Now R is expecting something after the { and you see the insertion symbol change to a + rather than the usual > character. R will keep expecting something and allow you to enter multiple lines until you enter a closing }.
Notice that the penultimate line is print(tmp); this displays your result to the screen (you could also have used return(tmp) to get the same output). You look at methods of displaying results shortly when you look at longer and more complex scripts.
If you type the name of your newly created function(), you see the lines of command that make up the function() like so:
> cummedian
function(x) {
tmp = seq_along(x)
for(i in 1:length(tmp)) tmp[i] = median(x[1:i])
print(tmp)
}
You can also use the args() command to view the required arguments for your function():
> args(manning)
function (radius, gradient, coef = 0.1125)
NULL
> args(cummedian)
function (x)
NULL
If you create a simple function from the command line and have created a name for it, the object will appear along with other objects when you use the ls() command. You can save your customized functions along with all the data by saving the workspace when you use quit() to quit the program. You can also save one or several function objects using the save() command like so:
> save(manning, cummedian, file = 'My Functions.R')
In this example you save two custom functions to a file called My Functions.R; note that you give the file an .R extension to differentiate the file from data. The filename must be in quotes. However, when you use the save() command, R converts the object into a special binary form and you no longer have a plain text file!
If you have used save() to keep your customized function object on disk, you must use load() to get it back again like so:
> load('My Functions.R')
Ideally you would save your function as a plain text script so that you can edit it. You can make your function objects save to disk as plain text by using the dump() command like so:
> dump(c('cummedian', 'manning'), file = 'My Functions.R')
In this example you use c() to create a list of objects that you want to dump to disk; note that the names of the objects must be in quotes. You might also have created a separate character vector of names, or you could use an ls() command to make your list:
> dump(ls(pattern = 'cummedian|manning'), file = 'My Functions.R')
If you use dump() to save your function objects, they appear as plain text and you could open and edit them with a text editor.
In most cases it is not practical to make complicated functions from the command line of R itself; it is better and easier to use a text editor. In Windows and Macintosh versions of R, editors are built-in to R and open when you make or open a script from the File menu. In Linux you must use a separate editor of your choice from the OS. If you use a text editor, you can call up the resulting plain text file from R using the source() command:
source(file.choose())
In this version of the command, you get to choose your file from a browser-like window. This option is not available in Linux OS; you must type the filename explicitly. For example:
> source(file = 'My Functions.R')
In addition to the usual commands that you have seen before, a few extra ones are especially useful for use with your customized functions and scripts.
When you create a custom function you may use several arguments and create new variables as part of any calculations. In the following example, which you saw earlier, you create a new variable called tmp:
> cummedian
function(x) {
tmp = seq_along(x) # a temp variable
for(i in 1:length(tmp)) tmp[i] = median(x[1:i])
print(tmp) # the result
}
This variable exists only while the function is being evaluated. It does not remain afterwards, as the following example shows:
> cummedian(mf$BOD)
[1] 200.0 190.0 180.0 157.5 135.0 127.5 120.0 127.5 135.0 151.5 158.0 151.5 145.0
[14] 145.0 145.0 151.5 158.0 157.5 157.0 157.5 158.0 157.5 157.0 151.0 145.0
> tmp
Error: object 'tmp' not found
As part of the function, therefore, you must present the result before the end of the series of commands; you can use the print() command to do this, which is what the function in the previous example uses. You might also create a “container” to hold the result of your function, in which case the final result is suppressed and saved to the result object instead like so:
> tmp = cummedian(mf$BOD)
> tmp
[1] 200.0 190.0 180.0 157.5 135.0 127.5 120.0 127.5 135.0 151.5 158.0 151.5 145.0
[14] 145.0 145.0 151.5 158.0 157.5 157.0 157.5 158.0 157.5 157.0 151.0 145.0
In this case you call your result object tmp, which, although it has the same name as the temporary variable, is in fact different!
You may want to produce text output as part of your script; for example, to embellish the result and make it clearer for the user. Often you will create summary statistics as part of your custom functions, and you can create text to set out the results in various forms to present them to the user more clearly. At other times you may want to pause and wait for an input from the user.
You can produce text on the screen to present results, or to remind the user of what was done. A simple way to do this is to use the cat() command, which enables you to present text on the screen. Your text must be in quotes, or be an object that is a character object. See the following example:
> msg = 'My work is far from done.'
> cat(msg)
My work is far from done.
> cat('Any text to be used must be in quotes')
Any text to be used must be in quotes
If you want to create new lines, you add to your command like so:
> cat('This is line 1
This is line 2
This is line 3')
This is line 1
This is line 2
This is line 3
You can have several parts to your cat() command, separated by commas. For example:
> cat('Am I done?
', msg, '
')
Am I done?
My work is far from done.
In the following simple script you create a data frame using some simple numeric data:
## Test script
dat1 = c(1,2,4,6,7,8)
dat2 = c(4,5,8,7,6,5)
dat3 = data.frame(dat1, dat2)
rm(dat1, dat2)
msg = 'My work is done.'
cat('
Our result data is dat3:
')
print(dat3)
cat('
', msg, '
')
## END
The preceding script works in the following manner:
> source('test script.R')
Our result data is: dat3
dat1 dat2
1 1 4
2 2 5
3 4 8
4 6 7
5 7 6
6 8 5
My work is done.
>
Now take a look at the following script; in it you create a customized function; this creates cumulative results for a numeric vector. Previously you used a similar script to create a running or cumulative median; here you can specify any mathematical function, although you set the default to the median in this instance:
## Cumulative functions
## Mark Gardener 2011
cum.fun = function(x, fun = median, ...) {
tmp = seq_along(x)
for(i in 1:length(tmp)) tmp[i] = fun(x[1:i], ...)
cat('
', deparse(substitute(fun)),'of', deparse(substitute(x)),'
')
print(tmp)
}
## END
In this preceding example you require two arguments, x and fun; x is the vector of numeric values and fun is the mathematical function you want to apply. You also include an ellipsis (...), which is a way of saying “allow other instructions that might be relevant.” You might, for example, want to add na.rm = TRUE as an instruction to take care of any NA items.
When you get the result it would be helpful to have a reminder of which function you requested when you typed the command. It would also be helpful to take the name of an object and display it as text so you can have a reminder of the data name that was used. This can be tricky though; you cannot include the data or function name here as x or fun because R will try to coerce the contents of these items as objects rather than as text (R assumes that you want to display the object itself and gives you the contents rather than the name). Additionally, you cannot put the names in quotes because they will become “fixed” and you simply get what was in the quotes. What you actually do, as you can see, is use deparse(substitute()). This looks at what you typed in the command as arguments and converts these arguments to text objects.
The result of the deparse(substitute()) command in the preceding script is as follows:
> cum.fun(mf$BOD, mean)
mean of mf$BOD
[1] 200.0000 190.0000 171.6667 158.7500 149.0000 144.1667 137.1429 141.0000
[9] 145.3333 150.3000 151.0000 150.5000 149.6923 149.3571 150.4000 152.6875
[17] 154.8824 155.0000 151.5789 155.7500 157.8571 153.1818 150.3043 148.0833
[25] 145.9600
There are times when you will want to pause the running of a script: this may be to give the user time to see an intermediate result (for example, a graphic) before moving on, or to provide options for the user to select: Pressing one key performs one operation and another key does something else.
You can use the readline() command to accept a key press from the user; the script will wait until a key is pressed. As part of the command, you can include a message to be displayed on the screen. For example:
> readline(prompt = 'Press <enter> to continue:')
The text that follows the prompt = instruction is displayed and the script pauses until a key is pressed. Although the text in this case implies that the Enter key should be pressed, any key will do.
You can give the user options by setting an object using the readline() command. The following example could be included in a larger script:
yorn <- readline(prompt = "Do you want to carry on? (Y or N) :")
if (yorn == 'Y' || yorn =='y'){
cat('Thank goodness')
}
If the user presses the Y key (uppercase or lowercase), the message is displayed. If the user presses anything else, nothing happens.
You can also create user prompts that provide multiple options. Each option must have its own code within a pair of curly brackets as in the following example:
# Explicit options
mopts = function(){
yorn <- readline(prompt = "Do you want to carry on? (Y or N) : ")
if(yorn == 'Y' || yorn == 'y') {
cat('Thank goodness')
}
if(yorn == 'N' || yorn =='n') {
cat('Oh dear')
}
}
## END
This preceding code creates a new function called mopts. When this function is run the user is presented with the text prompt. If she types one of the specified options then the appropriate message is displayed. In this case you can see that there are two options and if the user types something other than these two options then nothing will happen. You can create a catch all option by using the else command as the following example shows:
# Single positive option
sopt = function(){
yorn <- readline(prompt = "Do you want to carry on? (Y or N) : ")
if(yorn == 'Y' || yorn == 'y') cat('Thank goodness') else cat('Oh dear')
}
## END
The preceding code created a new function called sopt. When this function is run the user will see the same prompt as in the mopts code. If the user presses the Y key then the “Thank goodness” message is displayed. If any other key is pressed then the alternative “Oh dear” message is shown.
In the following activity you create a new customized function, which you then save to disk for future use. The script creates a bar chart of mean values and adds standard error bars. The data need to be in column format with one column for the numerical data (the response data) and one column for the grouping (predictor) variable.
## Bar Plot with Error Bars
## Mark Gardener 2011
barplot.eb = function(y, x, data, ...)
{ # start function code
# Parameters (data frame must be stacked)
# y = y variable
# x = x variable
# data = data.name
attach(data) # start by attaching data to read variables
mean = tapply(y, x, mean) # get mean values
sdev = tapply(y, x, sd) # get std. dev.
len = tapply(y, x, length) # get no. observations
se = sdev/sqrt(len) # determine std. err.
detach(data) # detach data file for tidiness
mat = rbind(mean, se, len) # make matrix of values
upper = round(max(mat[1,]+mat[2,]+0.5),0) # the upper limit to fit e-bars on y-axis
bp = barplot(mean, ylim = c(0, upper), beside = T, ...)
# make barplot, fix y-limit to fit largest error bar
segments(bp, mean+se, bp, mean-se) # error bars up/down
segments(bp-0.1, mean+se, bp+0.1, mean+se) # top hats
segments(bp-0.1, mean-se, bp+0.1, mean-se) # bottom hats
cat('
Summary stats for ', deparse(substitute(data)), '
') # summary message
print(mat) # show the data summary
cat('
') # newline for tidiness
} # end function code
## END
> source(file.choose())
## Bar Plot with Error Bars
## Mark Gardener 2011
barplot.eb = function(y, x, data, ...)
# Parameters
# y = y variable
# x = x variable
# data = data.name
{ # start function code
attach(data) # start by attaching data to read variables
mean = tapply(y, x, mean) # get mean values
sdev = tapply(y, x, sd) # get std. dev.
len = tapply(y, x, length) # get no. observations
se = sdev/sqrt(len) # determine std. err.
detach(data) # detach data file for tidiness
mat = rbind(mean, se, len) # make matrix of values
upper = round(max(mat[1,]+mat[2,]+0.5),0) # the upper limit to fit error bars on y-axis
bp = barplot(mean, ylim = c(0, upper), beside = T, ...) # make barplot, fix y-limit to fit largest error bar
segments(bp, mean+se, bp, mean-se) # error bars up/down
segments(bp-0.1, mean+se, bp+0.1, mean+se) # top hats
segments(bp-0.1, mean-se, bp+0.1, mean-se) # bottom hats
cat('
Summary stats for ', deparse(substitute(data)), '
') # summary message
print(mat) # show the data summary
cat('
') # newline for tidiness
} # end function code
## END
> barplot.eb(count, site, data = bfs)
Summary stats for bfs
Arable Grass Heath
mean 10.000000 6.833333 9.000000
se 1.424001 1.481366 0.755929
len 9.000000 12.000000 8.000000
There is, of course, a lot more to programming in R and many additional commands that you could employ. However, what you have seen here will take you a long way. By understanding more about how R works you will be able to see more and more how to customize it to carry out those tasks that are important to you.
Exercises
You can find answers to these exercises in Appendix A.
What You Learned In This Chapter
Topic | Key Points |
Copy and Paste | You can copy and paste text from another application into R. This enables you to create help files and snippets of code for future use. |
Customized functions: function(args) expr |
Create customized functions using the function() command; args = arguments to pass to expr (the actual function). Args may provide default values. |
Multiple lines of text{ various commands }function(args) {various commands } | Curly brackets can be used to separate subroutines. This allows multiple lines to be entered into console. |
Annotations:# comment | Anything following the # is ignored and so it can be used for comments. |
Function arguments/instructions: args(function_name) |
The args() command returns the arguments/instructions required by the named function. |
Looking at function code: function_name |
Supplying the function name without () and instructions displays the text of the script. |
Read text files as scripts: source(‘filename’)source(file.choose()) |
Reads a text file and executes the lines of text as R commands. |
Saving to disk: save(object, ‘filename’) |
Saves a binary version of an object (including a function) to disk. |
Loading from disk: load(‘filename’) |
Loads a binary object from disk. |
Save objects as text: dump(‘names_list’, file = ‘filename’) |
Attempts to write a text version of an object to disk. |
Text messages on screen: cat(‘text1’, ‘text2’)cat(chr_object)“n”“’” |
Produces a message in the console; requires plain text strings, explicitly or from character objects. Items may be separated by commas. A newline is produced using “ ”. A quote character (single or double) is produced by preceding it with a backslash (). |
Displaying results: print(object) |
Prints the named object to the console (that is, the screen). |
Wait for user input: readline(prompt = “text”) |
Pauses and waits for input from the user. A message can be displayed using the prompt = instruction. |
Convert user input to text: deparse(substitute(x)) |
Takes a named object and converts its name to text, which can then be displayed via cat(). |
18.225.255.187