Introduction

R is rapidly becoming the de facto standard among professionals, and is used in every conceivable discipline from science and medicine to business and engineering. R is more than just a computer program; it is a statistical programming environment and language. R is free and open source and is, therefore, available to everyone with a computer.

R is a language with its own vocabulary and grammar. To make R work for you, you communicate with the computer using the language of R and tell it what to do. You accomplish this by typing commands directly into the program. This means that you need to know some of the words of the language and how to put them together to make a “sentence” that R understands. This book aims to help with this task by providing a “dictionary” of words that R understands.

The help system built into R is extensive, but it is arranged by command name; this makes it hard to use unless you know some command names to start with. That’s where this book comes in handy; the command names (the vocabulary of R) are arranged by topic, so you can look up the kind of task that you require and find the correct R command for your needs.

I like to think of this book as a cross between a dictionary, a thesaurus, and a glossary, with a fair sprinkling of practical examples. Even though some may consider me an “R expert” at this point, I am still learning and still forgetting! I often have to refer to notes to remind me how to carry out a task in R. That is why I wrote this book—to help novice users learn more easily, and to provide more experienced users with a reference work they can delve into time and time again. I also learned a great deal more about R by writing about it, and I hope that you will find it an essential companion in your day-to-day conversations with R.

Who This Book Is For

This book is for anyone who needs to analyze any data, whatever their discipline or line of work. Whether you are in science, business, medicine, or engineering, you will have data to analyze and results to present. R is powerful and flexible and completely cross-platform. This means you can share data and results with anyone. R is backed by a huge project team, so being free does not mean being inferior!

Whether you are a student or an experienced programmer, this book is meant to be an essential reference. If you are completely new to R, this book will enable you to learn more quickly by providing an easy-to-use “dictionary.” You may also consider reading my previous book, Beginning R: The Statistical Programming Language, which provides a different learning environment by taking you from simple tasks to more complex ones in a linear fashion.

If you are already familiar with R, this book will help as a useful reference work that you can call upon time and time again. It is easy to forget the name of a command or the exact syntax of the command. In addition to jogging your memory, the examples in the book will help put the commands into context.

What This Book Covers

Each command listed in this book has an explanation of what the command does and how to use it—the “grammar,” if you will. Related commands are also listed, as in a thesaurus, so if the word you are looking at is not quite what you need, you are likely to see the correct one nearby.

I can’t pretend that this reference book covers every command in the R language, but it covers a lot (more than 400). I’ve also not covered some of the more obscure parameters (formally called “arguments” in R) for some of the commands. I called this book “Essential” because I believe it covers the essentials. I also hope that you will find it essential in your day-to-day use of R.

One of the weaknesses of the R help system is that some of the examples are hard to follow, so each command listed in this book is accompanied by various examples. These show you the command “in action” and hopefully help you to gain a better understanding of how the command works. The examples are written in R code and set out as if you had typed them into R yourself. And unlike the built-in help system in R, you get to see the results, too!

How This Book Is Structured

This book is not a conventional textbook; it is intended as a reference work that you can delve into at any point.

This book is organized in a topic-led, logical manner so that you can look for the kind of task that you want to carry out in R and find the command you need to carry out that task as easily as possible, even if you do not know the name of the command. The book is split into four grand themes:

  • Theme 1: “Data”
  • Theme 2: “Math and Statistics”
  • Theme 3: “Graphics”
  • Theme 4: “Utilities”

These are hopefully self-explanatory, with the exception perhaps of “Utilities”; this covers the commands that did not fit easily into one of the other themes, particularly those relating to the programming side of R.

You can use the table of contents to find your way to the topic that matches the task you want to undertake. If the command you need is not where you first look, there is a good chance that the command you did find will have a link to the appropriate topic or command (some commands have entries on more than one topic).

The index is also a helpful tool because it contains an alphabetical list of all the commands, so you can always find a specific command by its name there.

The following is a brief description of each of the four main themes:

Theme 1: Data—This theme is concerned with aspects of dealing with data. In particular:
  • Data types—Different kinds of data and converting one kind of data into another kind.
  • Creating data—Commands for making data items from the keyboard.
  • Importing data—Getting data from sources on disk.
  • Saving data—How to save your work.
  • Viewing data—Seeing what data you have in R.
  • Summarizing data—Ways of summarizing data objects. Some of these commands also appear in Theme 2, “Math and Statistics.”
  • Distribution of data—Looking at different data distributions and the commands associated with them, including random numbers.
Theme 2: Math and Statistics—This theme covers the commands that deal with math and statistical routines:
  • Mathematical operations—Various kinds of math, including complex numbers, matrix math, and trigonometry.
  • Summary statistics—Summarizing data; some of these commands are also in Theme 1, “Data.”
  • Differences tests—Statistical tests for differences in samples.
  • Correlations and associations—Including covariance and goodness of fit tests.
  • Analysis of variance and linear modeling—Many of the commands associated with ANOVA and linear modeling can be pressed into service for other analyses.
  • Miscellaneous Tests—Non-linear modeling, cluster analysis, time series, and ordination.
Theme 3: Graphics—This theme covers the graphical aspects of the R language:
  • Making graphs—How to create a wide variety of basic graphs.
  • Adding to graphs—How to add various components to graphs, such as titles, additional points, and shapes.
  • Graphical parameters—How to embellish and alter the appearance of graphs, including how to create multiple graphs in one window.
Theme 4: Utilities—This theme covers topics that do not fit easily into the other themes:
  • Installing R—Notes on installing R and additional packages of R commands.
  • Using R—Accessing the help system, history of previously typed commands, managing packages, and more.
  • Programming—Commands that are used mostly in the production of custom functions and scripts. You can think of these as the “tools” of the programming language.

Each of the topics is also split into subtopics to help you navigate your way to the command(s) you need. Each command has an entry that is split into the following sections:

  • Command Name—Name of the command and a brief description of what it does.
  • Common Usage—Illustrates how the command looks with commonly used options. Use this section as a memory-jogger; if you need fine details you can look in the “Command Parameters” section.
  • Related Commands—A list of related commands along with the page numbers or a link to their entries so you can easily cross-reference.
  • Command Parameters—Details of commonly used parameters for the command along with an explanation of what they do.
  • Examples—Examples of the command in action. The section is set out in code style as if you had typed the commands from the keyboard yourself. You also see the resulting output that R produces (including graphical output).

Some commands are relevant to more than one theme or section; those commands either have a cross-reference and/or have an entry in each applicable place.

What You Need to Use This Book

R is cross-platform technology and so whatever computer you use, you should be able to run the program. R is a huge, open-source project and is changing all the time. However, the basic commands have altered little, and you should find this book relevant for whatever version you are using. I wrote this book using Mac R version 2.12.1, Windows R version 2.14.2, and Linux R version 2.14.1.

Having said that, if your version of R is older than about 2009, I recommend getting a newer version.

Conventions

To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.

R Code

The commands you need to type into R and the output you get from R are shown in a monospace font. Each example that shows lines that are typed by the user begins with the > symbol, which mimics the R cursor like so:

> help()

Lines that begin with something other than the > symbol represent the output from R (but look out for typed lines that are long and spread over more than one line). In the following example the first line was typed by the user and the second line is the result:

> data1
 [1] 3 5 7 5 3 2 6 8 5 6 9

Annotations

The hash symbol (#) is used as an annotation character in R (see the following example). Anything that follows is ignored by R until it encounters a new line character. The examples used throughout this book contain plenty of annotations to help guide you through the complexities and facilitate your understanding of the code lines.

## Some lines begin with hash symbols; that entire line is ignored by R.
## This allows you to see the commands in action with blow by blow notes.
> help(help) # This line has an annotation after the command

Operational Assignment

R uses two forms of “assignment.” The original form (the form preferred by many programmers) uses a kind of arrow like so: <-. This is used to indicate an assignment that runs from right to left. For example:

> x <- 23

This assigns the value 23 to a variable named x. An alternative form of assignment is mathematical type of assignment, the equals sign (=):

> x = 23

In most cases the two are equivalent and which you use is entirely up to you. Most of the help examples found in R and on the Internet use the arrow (<-). Throughout this book I have tended to use the = operator (because that is what I am used to), unless <- is the only way to make the command work.

Command Parameters

Most R commands accept various parameters; you can think of them as additional instructions that make the command work in various ways. Some parameters have default values that are used if you do not explicitly indicate an alternative. These parameters are also “order specific.” This means that you can specify the value you want the parameter to take without naming it as long as the values are in the correct order. An example should clarify this; the rnorm command generates random numbers from the normal distribution. The full command looks like this:

rnorm(n, mean = 0, sd = 1)

You supply n, the number of random values you want; mean, the mean of the values; and sd, the standard deviation. Both the mean and sd parameters have defaults, which are used if you do not specify them explicitly. You can run this command by typing any of the following:

> rnorm(n = 10, mean = 0, sd = 1)
> rnorm(10, 0, 1)
> rnorm(10)

These all produce the same result: ten values drawn randomly from a normally distributed set of values with a mean of zero and a standard deviation of one. The first line shows the full version of the command. The second line shows values for all the parameters, but unnamed. The third line shows only one value; this will be taken as n, with the other parameters having their default values.

This is useful for programming and using R because it means you can avoid a lot of typing. However, if you are trying to learn R it can be confusing because you might not remember what all the parameters are.

Some commands will also accept the name of the parameters in abbreviated form; others will not. In this book I have tried to use the full version of commands in the examples; I hope that this will help clarify matters.

Cross-References

You can find many cross-references in this book in addition to the commands listed in the “Related Commands” section of each command’s entry. These cross-references look like this:


r-glass.eps
The magnifying glass icon indicates a cross reference.

Cross references are used in the following instances:

  • Relevant commands in the same section or a different section.
  • Relevant sections in the same theme or in a different theme.
  • An instance in which the command in question appears in another theme or section.
  • An instance in which the command in question has related information in another theme.

Data Downloads

If you come across a command that has an example you would like to try on your own, you can follow along by manually typing the example into your own version of R. Some of these examples use sample data that is available for download at http://www.wiley.com/go/EssentialRReference. You will find all examples that require the data are accompanied by a download icon and note indicating the name of the file so you know it’s available for download and can easily locate it in the download file. The download notes look like this:


download.eps
The download icon indicates an example that uses data you need to download.

Once at the site, simply locate the book’s title and click the Download Code link on the book’s detail page to obtain all the example data for the book.

There will only be one file to download and it is called Essential.RData. This one file contains the example data sets you need for the whole book; it contains very few because I have tried to make all data fairly simple and short so that you can type it directly. Once you have the file on your computer you can load it into R by one of several methods:

  • For Windows or Mac you can drag the Essential.RData file icon onto the R program icon; this opens R if it is not already running and loads the data. If R is already open, the data is appended to anything you already have in R; otherwise, only the data in the file is loaded.
  • If you have Windows or Macintosh you can also load the file using menu commands or use a command typed into R:
  • For Windows use File > Load Workspace, or type the following command in R:
> load(file.choose())
  • For Mac use Workspace > Load Workspace File, or type the following command in R (same as in Windows):
> load(file.choose())
  • If you have Linux, you can use the load() command but you must specify the filename (in quotes) exactly. For example:
    > load(“Essential.RData”)

The Essential.RData file must be in your default working directory and if it is not, you must specify the location as part of the filename.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.66.185