This book is about data analysis and the programming language called R. This is rapidly becoming the de facto standard among professionals, and is used in every conceivable discipline from science and medicine to business and engineering.
R is more than just a computer program; it is a statistical programming environment and language. R is free and open source and is therefore available to everyone with a computer. It is very powerful and flexible, but it is also unlike most of the computer programs you are likely used to. You have to type commands directly into the program to make it work for you. Because of this, and its complexity, R can be hard to get a grip on.
This book delves into the language of R and makes it accessible using simple data examples to explore its power and versatility. In learning how to “speak R,” you will unlock its potential and gain better insights into tackling even the most complex of data analysis tasks.
This book is for anyone who needs to analyze any data, whatever their discipline or line of work. Whether you are in science, business, medicine, or engineering, you will have data to analyze and results to present. R is powerful and flexible and completely cross-platform. This means you can share data and results with anyone. R is backed by a huge project team, so being free does not mean being inferior!
If you are completely new to R, this book will enable you to get it and start to become familiar with it. There is no assumption that you know anything about the program to begin with. If you are already familiar with R, you will find this book a useful reference that you can call upon time and time again; the first chapter is largely concerned with installing R, so you may want to skip to Chapter 2.
This book is not about statistical analyses, so some familiarity with basic analytical methods is helpful (but not obligatory). The book deals with the means to make R work for you; this means learning the language of R rather than learning statistics. Once you are familiar with R you will be empowered to use it to undertake a huge variety of analytical tasks, more than can be conveniently packaged into a single book. R also produces presentation-quality graphics and this book leads you through the complexities of that.
R is a computer program and statistical programming language/environment. It allows a wide range of analytical methods to be used and produces presentation-quality graphics. This book covers the language of R, and leads you toward a better understanding of how to get R to do the things you need. There is less emphasis on the actual statistical tests; indeed, R is so flexible that the list of tests it can perform is far too large to be covered in an introductory book such as this. Rather, the aim is to become familiar with the language of R and to carry out some of the more commonly used statistical methods. In this way, you can strike out on your own and explore the full potential of R for yourself.
So, the focus is on the operation of R itself. Along the way you learn how to carry out a range of commonly used statistical methods, including analysis of variance (ANOVA) and linear regression, which are widely used in many fields and, therefore, important to know. You also learn a range of ways to produce a wide variety of graphics that should suit your needs.
This book covers most recent versions of R. The R program does change from time to time as new versions are released. However, most of the commands you will need to know have not changed, and even older (in computer terms) versions will work quite happily.
The book has a general progressive character, and later chapters tend to build on skills you learned earlier. Therefore if you are a beginner, you will probably find it most useful to start at the beginning and work your way through in a progressive manner. If you are a more seasoned user, you may want to use selected chapters as reference material, to refresh your skills.
No approach to learning R is universally adequate, but I have tried to provide the most logical path possible. For example, learning to produce graphics is very important, but unless you know what kinds of analyses you are likely to need to represent, making these graphs might seem a bit prosaic. Therefore, the main graphics chapter appears after some of the chapters on analysis.
In general terms, the book begins with notes on how to get and install R, and how to access the help system. Next you are introduced to the basics of data—how to get data into R, for example. After this you find out how to manipulate data, carry out some basic statistical analyses, and begin to tackle graphics. Later you learn some more advanced analytical methods and return to graphics. Finally, you look at ways to use R to create your own programs.
Each chapter begins with an overview of the topics you will learn. The text contains many examples and is written in a “copy me” style. Throughout the text, all the concepts are illustrated with simple examples. You can download the data from the companion website and follow along as you read (details on this are discussed shortly). The book contains a variety of activities that you are urged to follow; each is designed to help you with an important topic. The chapters all end with a series of exercises that help you to consolidate your learning (the solutions are in the appendix). Finally, the chapters end with a brief summary of what you learned and a table illustrating the topics and some key points, which are useful as reference material. Following is a brief description of each chapter.
The only things you need to use this book are a computer and enthusiasm! The R program works on any operating system, so you can use Windows, Macintosh, or Linux (any version). R even works quite adequately on ancient (in computer terms) computers, so you do not need anything particularly hi-spec. An Internet connection is required at some point because you need to get R from the R-project website. However, it is perfectly possible to download the installation files onto a separate computer and transfer them to your working machine.
If you already have a version of R, it is not necessary to get the latest version. R is continually changing and improving, but the older versions of R will most likely work with this book because the basic command set has changed relatively little. Having said that, I suggest you update your version of R if it is older than 2009.
To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.
The commands you need to type into R and the output you get from R are shown in a monospace font. Each example that shows lines that are typed by the user begins with the > symbol, which mimics the R cursor like so:
> help()
Lines that begin with something other than the > symbol represent the output from R (but look out for typed lines that are long and spread over more than one line), so in the following example the first line was typed by the user and the second line is the result:
> data1
[1] 3 5 7 5 3 2 6 8 5 6 9
As for styles in the text:
We use a monofont type with no highlighting for most code examples.
We use bold to emphasize code that’s particularly important in the present context.
As you work through the examples in this book, you may choose either to type in all the data and code manually or to use the source code and data object files that accompany the book. All of the data and source code used in this book is available for download at http://www.wrox.com. You will find the data sets that you need for each example activity are accompanied by a download icon and note indicating the name of the data file so you know it’s available for download and can easily locate it in the download file. Once at the site, simply locate the book’s title (either by using the Search box or by using one of the title lists) and click the Download Code link on the book’s detail page to obtain all the source code for the book.
There will only be one file to download and it is called Beginning.RData. This one file contains all the example datasets and scripts you need for the whole book. Once you have the file on your computer you can load it into R by one of several methods:
> load(file.choose())
> load(file.choose())
> load(“Beginning.RData”)
The Beginning.RData file must be in your default working directory and if it is not you must specify the location as part of the filename.
Alternatively, you can go to the main Wrox code download page at http://www.wrox.com/dynamic/books/download.aspx to see the code available for this book and all other Wrox books.
We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information.
To find the errata page for this book, go to http://www.wrox.com and locate the title using the Search box or one of the title lists. Then, on the book details page, click the Book Errata link. On this page you can view all errata that has been submitted for this book and posted by Wrox editors.
If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact/techsupport.shtml and complete the form there to send us the error you have found. We’ll check the information and, if appropriate, post a message to the book’s errata page and fix the problem in subsequent editions of the book.
For author and peer discussion, join the P2P forums at p2p.wrox.com. The forums are a web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums.
At http://p2p.wrox.com you will find a number of different forums that will help you not only as you read this book, but also as you develop your own applications. To join the forums, just follow these steps:
Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing.
For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specific to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page.
13.59.173.242