
This book is about data analysis and the programming language called R. This is rapidly becoming the de facto standard among professionals, and is used in every conceivable discipline from science and medicine to business and engineering.

R is more than just a computer program; it is a statistical programming environment and language. R is free and open source and is therefore available to everyone with a computer. It is very powerful and flexible, but it is also unlike most of the computer programs you are likely used to. You have to type commands directly into the program to make it work for you. Because of this, and its complexity, R can be hard to get a grip on.

This book delves into the language of R and makes it accessible using simple data examples to explore its power and versatility. In learning how to “speak R,” you will unlock its potential and gain better insights into tackling even the most complex of data analysis tasks.

Who This Book Is For

This book is for anyone who needs to analyze any data, whatever their discipline or line of work. Whether you are in science, business, medicine, or engineering, you will have data to analyze and results to present. R is powerful and flexible and completely cross-platform. This means you can share data and results with anyone. R is backed by a huge project team, so being free does not mean being inferior!

If you are completely new to R, this book will enable you to get it and start to become familiar with it. There is no assumption that you know anything about the program to begin with. If you are already familiar with R, you will find this book a useful reference that you can call upon time and time again; the first chapter is largely concerned with installing R, so you may want to skip to Chapter 2.

This book is not about statistical analyses, so some familiarity with basic analytical methods is helpful (but not obligatory). The book deals with the means to make R work for you; this means learning the language of R rather than learning statistics. Once you are familiar with R you will be empowered to use it to undertake a huge variety of analytical tasks, more than can be conveniently packaged into a single book. R also produces presentation-quality graphics and this book leads you through the complexities of that.

What This Book Covers

R is a computer program and statistical programming language/environment. It allows a wide range of analytical methods to be used and produces presentation-quality graphics. This book covers the language of R, and leads you toward a better understanding of how to get R to do the things you need. There is less emphasis on the actual statistical tests; indeed, R is so flexible that the list of tests it can perform is far too large to be covered in an introductory book such as this. Rather, the aim is to become familiar with the language of R and to carry out some of the more commonly used statistical methods. In this way, you can strike out on your own and explore the full potential of R for yourself.

So, the focus is on the operation of R itself. Along the way you learn how to carry out a range of commonly used statistical methods, including analysis of variance (ANOVA) and linear regression, which are widely used in many fields and, therefore, important to know. You also learn a range of ways to produce a wide variety of graphics that should suit your needs.

This book covers most recent versions of R. The R program does change from time to time as new versions are released. However, most of the commands you will need to know have not changed, and even older (in computer terms) versions will work quite happily.

How This Book Is Structured

The book has a general progressive character, and later chapters tend to build on skills you learned earlier. Therefore if you are a beginner, you will probably find it most useful to start at the beginning and work your way through in a progressive manner. If you are a more seasoned user, you may want to use selected chapters as reference material, to refresh your skills.

No approach to learning R is universally adequate, but I have tried to provide the most logical path possible. For example, learning to produce graphics is very important, but unless you know what kinds of analyses you are likely to need to represent, making these graphs might seem a bit prosaic. Therefore, the main graphics chapter appears after some of the chapters on analysis.

In general terms, the book begins with notes on how to get and install R, and how to access the help system. Next you are introduced to the basics of data—how to get data into R, for example. After this you find out how to manipulate data, carry out some basic statistical analyses, and begin to tackle graphics. Later you learn some more advanced analytical methods and return to graphics. Finally, you look at ways to use R to create your own programs.

Each chapter begins with an overview of the topics you will learn. The text contains many examples and is written in a “copy me” style. Throughout the text, all the concepts are illustrated with simple examples. You can download the data from the companion website and follow along as you read (details on this are discussed shortly). The book contains a variety of activities that you are urged to follow; each is designed to help you with an important topic. The chapters all end with a series of exercises that help you to consolidate your learning (the solutions are in the appendix). Finally, the chapters end with a brief summary of what you learned and a table illustrating the topics and some key points, which are useful as reference material. Following is a brief description of each chapter.

Chapter 1: Introducing R: What It Is and How to Get It—In this chapter you see how to get R and install it on your computer. You also learn how to access the built-in help system and find out about additional packages of useful analytical routines that you can add to R.
Chapter 2: Starting Out: Becoming Familiar with R—This chapter builds some familiarity with working with R, beginning with some simple math and culminating in importing and making data objects that you can work with (and saving data to disk for later use).
Chapter 3: Starting Out: Working With Objects—This chapter deals with manipulating the data that you have created or imported. These are important tasks that underpin many of the later exercises. The skills you learn here will be put to use over and over again.
Chapter 4: Data: Descriptive Statistics and Tabulation—This chapter is all about summarizing data. Here you learn about basic summary methods, including cumulative statistics. You also learn how about cross-tabulation and how to create summary tables.
Chapter 5: Data: Distribution—In this chapter you look at visualizing data using graphical methods—for example, histograms—as well as mathematical ones. This chapter also includes some notes about random numbers and different types of distribution (for example, normal and Poisson).
Chapter 6: Simple Hypothesis Testing—In this chapter you learn how to carry out some basic statistical methods such as the t-test, correlation, and tests of association. Learning how to do these is helpful for when you have to carry out more complex analyses and also illustrates a range of techniques for using R.
Chapter 7: Introduction to Graphical Analysis—In this chapter you learn how to produce a range of graphs including bar charts, scatter plots, and pie charts. This is a “first look” at making graphs, but you return to this subject in Chapter 11, where you learn how to turn your graphs from merely adequate to stunning.
Chapter 8: Formula Notation and Complex Statistics—As your analyses become more complex, you need a more complex way to tell R what you want to do. This chapter is concerned with an important element of R: how to define complex situations. The chapter has two main parts. The first part shows how the formula notation can be used with simple situations. The second part uses an important analytical method, analysis of variance, as an illustration. The rest of the chapter is devoted to ANOVA. This is an important chapter because the ability to define complex analytical situations is something you will inevitably require at some point.
Chapter 9: Manipulating Data and Extracting Components—This chapter builds on the previous one. Now that you have seen how to define more complex analytical situations, you learn how to make and rearrange your data so that it can be analyzed more easily. This also builds on knowledge gained in Chapter 3. In many cases, when you have carried out an analysis you will need to extract data for certain groups; this chapter also deals with that, giving you more tools that you will need to carry out complex analyses easily.
Chapter 10: Regression (Linear Modeling)—This chapter is all about regression. It builds on earlier chapters and covers various aspects of this important analytical method. You learn how to carry out basic regression, as well as complex model building and curvilinear regression. It is also important because it illustrates some useful aspects of R (for example, how to dissect results). The later parts of the chapter deal with graphical aspects of regression, such as how to add lines of best fit and confidence intervals.
Chapter 11: More About Graphs—This chapter builds on the earlier chapter on graphics (Chapter 7) and also from the previous chapter on regression. It shows you how to produce more customized graphs from your data. For example, you learn how to add text to plots and axes, and how to make superscript and subscript text and mathematical symbols. You learn how to add legends to plots, and how to add error bars to bar charts or scatter plots. Finally, you learn how to export graphs to disk as high-quality graphics files, suitable for publication.
Chapter 12: Writing Your Own Scripts: Beginning to Program—In this chapter you learn how to start producing customized functions and simple scripts that can automate your workflow, and make complex and repetitive tasks a lot easier.

What You Need to Use This Book

The only things you need to use this book are a computer and enthusiasm! The R program works on any operating system, so you can use Windows, Macintosh, or Linux (any version). R even works quite adequately on ancient (in computer terms) computers, so you do not need anything particularly hi-spec. An Internet connection is required at some point because you need to get R from the R-project website. However, it is perfectly possible to download the installation files onto a separate computer and transfer them to your working machine.

If you already have a version of R, it is not necessary to get the latest version. R is continually changing and improving, but the older versions of R will most likely work with this book because the basic command set has changed relatively little. Having said that, I suggest you update your version of R if it is older than 2009.


To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.

The commands you need to type into R and the output you get from R are shown in a monospace font. Each example that shows lines that are typed by the user begins with the > symbol, which mimics the R cursor like so:

> help()

Lines that begin with something other than the > symbol represent the output from R (but look out for typed lines that are long and spread over more than one line), so in the following example the first line was typed by the user and the second line is the result:

> data1
 [1] 3 5 7 5 3 2 6 8 5 6 9
Try It Out
The Try It Out is an exercise you should work through, following the text in the book.
1. They usually consist of a set of steps.
2. Each step has a number.
3. Follow the steps through with your copy of the database.
How It Works
After each Try It Out, the code you’ve typed is explained in detail.
Boxes with a warning icon like this one hold important, not-to-be forgotten information that is directly relevant to the surrounding text.
The pencil icon indicates notes, tips, hints, tricks, and asides to the current discussion.

As for styles in the text:

  • We highlight new terms and important words when we introduce them.
  • We show keyboard strokes like this: Ctrl+A.
  • We show filenames, URLs, and code within the text like so:
  • We present code in two different ways:
    We use a monofont type with no highlighting for most code examples.
    We use bold to emphasize code that’s particularly important in the present context.

