Introduction

This book is about data analysis and the programming language called R. This is rapidly becoming the de facto standard among professionals, and is used in every conceivable discipline from science and medicine to business and engineering.

R is more than just a computer program; it is a statistical programming environment and language. R is free and open source and is therefore available to everyone with a computer. It is very powerful and flexible, but it is also unlike most of the computer programs you are likely used to. You have to type commands directly into the program to make it work for you. Because of this, and its complexity, R can be hard to get a grip on.

This book delves into the language of R and makes it accessible using simple data examples to explore its power and versatility. In learning how to “speak R,” you will unlock its potential and gain better insights into tackling even the most complex of data analysis tasks.

Who This Book Is For

This book is for anyone who needs to analyze any data, whatever their discipline or line of work. Whether you are in science, business, medicine, or engineering, you will have data to analyze and results to present. R is powerful and flexible and completely cross-platform. This means you can share data and results with anyone. R is backed by a huge project team, so being free does not mean being inferior!

If you are completely new to R, this book will enable you to get it and start to become familiar with it. There is no assumption that you know anything about the program to begin with. If you are already familiar with R, you will find this book a useful reference that you can call upon time and time again; the first chapter is largely concerned with installing R, so you may want to skip to Chapter 2.

This book is not about statistical analyses, so some familiarity with basic analytical methods is helpful (but not obligatory). The book deals with the means to make R work for you; this means learning the language of R rather than learning statistics. Once you are familiar with R you will be empowered to use it to undertake a huge variety of analytical tasks, more than can be conveniently packaged into a single book. R also produces presentation-quality graphics and this book leads you through the complexities of that.

What This Book Covers

R is a computer program and statistical programming language/environment. It allows a wide range of analytical methods to be used and produces presentation-quality graphics. This book covers the language of R, and leads you toward a better understanding of how to get R to do the things you need. There is less emphasis on the actual statistical tests; indeed, R is so flexible that the list of tests it can perform is far too large to be covered in an introductory book such as this. Rather, the aim is to become familiar with the language of R and to carry out some of the more commonly used statistical methods. In this way, you can strike out on your own and explore the full potential of R for yourself.

So, the focus is on the operation of R itself. Along the way you learn how to carry out a range of commonly used statistical methods, including analysis of variance (ANOVA) and linear regression, which are widely used in many fields and, therefore, important to know. You also learn a range of ways to produce a wide variety of graphics that should suit your needs.

This book covers most recent versions of R. The R program does change from time to time as new versions are released. However, most of the commands you will need to know have not changed, and even older (in computer terms) versions will work quite happily.

How This Book Is Structured

The book has a general progressive character, and later chapters tend to build on skills you learned earlier. Therefore if you are a beginner, you will probably find it most useful to start at the beginning and work your way through in a progressive manner. If you are a more seasoned user, you may want to use selected chapters as reference material, to refresh your skills.

No approach to learning R is universally adequate, but I have tried to provide the most logical path possible. For example, learning to produce graphics is very important, but unless you know what kinds of analyses you are likely to need to represent, making these graphs might seem a bit prosaic. Therefore, the main graphics chapter appears after some of the chapters on analysis.

In general terms, the book begins with notes on how to get and install R, and how to access the help system. Next you are introduced to the basics of data—how to get data into R, for example. After this you find out how to manipulate data, carry out some basic statistical analyses, and begin to tackle graphics. Later you learn some more advanced analytical methods and return to graphics. Finally, you look at ways to use R to create your own programs.

Each chapter begins with an overview of the topics you will learn. The text contains many examples and is written in a “copy me” style. Throughout the text, all the concepts are illustrated with simple examples. You can download the data from the companion website and follow along as you read (details on this are discussed shortly). The book contains a variety of activities that you are urged to follow; each is designed to help you with an important topic. The chapters all end with a series of exercises that help you to consolidate your learning (the solutions are in the appendix). Finally, the chapters end with a brief summary of what you learned and a table illustrating the topics and some key points, which are useful as reference material. Following is a brief description of each chapter.

Chapter 1: Introducing R: What It Is and How to Get It—In this chapter you see how to get R and install it on your computer. You also learn how to access the built-in help system and find out about additional packages of useful analytical routines that you can add to R.
Chapter 2: Starting Out: Becoming Familiar with R—This chapter builds some familiarity with working with R, beginning with some simple math and culminating in importing and making data objects that you can work with (and saving data to disk for later use).
Chapter 3: Starting Out: Working With Objects—This chapter deals with manipulating the data that you have created or imported. These are important tasks that underpin many of the later exercises. The skills you learn here will be put to use over and over again.
Chapter 4: Data: Descriptive Statistics and Tabulation—This chapter is all about summarizing data. Here you learn about basic summary methods, including cumulative statistics. You also learn how about cross-tabulation and how to create summary tables.
Chapter 5: Data: Distribution—In this chapter you look at visualizing data using graphical methods—for example, histograms—as well as mathematical ones. This chapter also includes some notes about random numbers and different types of distribution (for example, normal and Poisson).
Chapter 6: Simple Hypothesis Testing—In this chapter you learn how to carry out some basic statistical methods such as the t-test, correlation, and tests of association. Learning how to do these is helpful for when you have to carry out more complex analyses and also illustrates a range of techniques for using R.
Chapter 7: Introduction to Graphical Analysis—In this chapter you learn how to produce a range of graphs including bar charts, scatter plots, and pie charts. This is a “first look” at making graphs, but you return to this subject in Chapter 11, where you learn how to turn your graphs from merely adequate to stunning.
Chapter 8: Formula Notation and Complex Statistics—As your analyses become more complex, you need a more complex way to tell R what you want to do. This chapter is concerned with an important element of R: how to define complex situations. The chapter has two main parts. The first part shows how the formula notation can be used with simple situations. The second part uses an important analytical method, analysis of variance, as an illustration. The rest of the chapter is devoted to ANOVA. This is an important chapter because the ability to define complex analytical situations is something you will inevitably require at some point.
Chapter 9: Manipulating Data and Extracting Components—This chapter builds on the previous one. Now that you have seen how to define more complex analytical situations, you learn how to make and rearrange your data so that it can be analyzed more easily. This also builds on knowledge gained in Chapter 3. In many cases, when you have carried out an analysis you will need to extract data for certain groups; this chapter also deals with that, giving you more tools that you will need to carry out complex analyses easily.
Chapter 10: Regression (Linear Modeling)—This chapter is all about regression. It builds on earlier chapters and covers various aspects of this important analytical method. You learn how to carry out basic regression, as well as complex model building and curvilinear regression. It is also important because it illustrates some useful aspects of R (for example, how to dissect results). The later parts of the chapter deal with graphical aspects of regression, such as how to add lines of best fit and confidence intervals.
Chapter 11: More About Graphs—This chapter builds on the earlier chapter on graphics (Chapter 7) and also from the previous chapter on regression. It shows you how to produce more customized graphs from your data. For example, you learn how to add text to plots and axes, and how to make superscript and subscript text and mathematical symbols. You learn how to add legends to plots, and how to add error bars to bar charts or scatter plots. Finally, you learn how to export graphs to disk as high-quality graphics files, suitable for publication.
Chapter 12: Writing Your Own Scripts: Beginning to Program—In this chapter you learn how to start producing customized functions and simple scripts that can automate your workflow, and make complex and repetitive tasks a lot easier.

What You Need to Use This Book

The only things you need to use this book are a computer and enthusiasm! The R program works on any operating system, so you can use Windows, Macintosh, or Linux (any version). R even works quite adequately on ancient (in computer terms) computers, so you do not need anything particularly hi-spec. An Internet connection is required at some point because you need to get R from the R-project website. However, it is perfectly possible to download the installation files onto a separate computer and transfer them to your working machine.

If you already have a version of R, it is not necessary to get the latest version. R is continually changing and improving, but the older versions of R will most likely work with this book because the basic command set has changed relatively little. Having said that, I suggest you update your version of R if it is older than 2009.

Conventions

To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.

The commands you need to type into R and the output you get from R are shown in a monospace font. Each example that shows lines that are typed by the user begins with the > symbol, which mimics the R cursor like so:

> help()

Lines that begin with something other than the > symbol represent the output from R (but look out for typed lines that are long and spread over more than one line), so in the following example the first line was typed by the user and the second line is the result:

> data1
 [1] 3 5 7 5 3 2 6 8 5 6 9
Try It Out
The Try It Out is an exercise you should work through, following the text in the book.
1. They usually consist of a set of steps.
2. Each step has a number.
3. Follow the steps through with your copy of the database.
How It Works
After each Try It Out, the code you’ve typed is explained in detail.
warning.eps
WARNING
Boxes with a warning icon like this one hold important, not-to-be forgotten information that is directly relevant to the surrounding text.
note.eps
NOTE
The pencil icon indicates notes, tips, hints, tricks, and asides to the current discussion.

As for styles in the text:

  • We highlight new terms and important words when we introduce them.
  • We show keyboard strokes like this: Ctrl+A.
  • We show filenames, URLs, and code within the text like so: persistence.properties.
  • We present code in two different ways:
    We use a monofont type with no highlighting for most code examples.
    We use bold to emphasize code that’s particularly important in the present context.

Source Code

As you work through the examples in this book, you may choose either to type in all the data and code manually or to use the source code and data object files that accompany the book. All of the data and source code used in this book is available for download at http://www.wrox.com. You will find the data sets that you need for each example activity are accompanied by a download icon and note indicating the name of the data file so you know it’s available for download and can easily locate it in the download file. Once at the site, simply locate the book’s title (either by using the Search box or by using one of the title lists) and click the Download Code link on the book’s detail page to obtain all the source code for the book.

There will only be one file to download and it is called Beginning.RData. This one file contains all the example datasets and scripts you need for the whole book. Once you have the file on your computer you can load it into R by one of several methods:

  • For Windows or Mac you can drag the Beginning.RData file icon onto the R program icon; this will open R if it is not already running and load the data. If R is already open, the data will be appended to anything you already have in R; otherwise only the data in the file will be loaded.
  • If you have Windows or Macintosh you can also load the file using menu commands or use a command typed into R:
    • For Windows use File ⇒ Load Workspace, or type the following command in R:
      > load(file.choose())
  • For Mac use Workspace ⇒ Load Workspace File, or type the following command in R (same as in Windows):
    > load(file.choose())
    • If you have Linux then you can use the load() command but must specify the filename (in quotes) exactly, for example:
      > load(“Beginning.RData”)

The Beginning.RData file must be in your default working directory and if it is not you must specify the location as part of the filename.

note.eps
NOTE
Because many books have similar titles, you may find it easiest to search by ISBN; this book’s ISBN is 978-1-118-164303.

Alternatively, you can go to the main Wrox code download page at http://www.wrox.com/dynamic/books/download.aspx to see the code available for this book and all other Wrox books.

Errata

We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information.

To find the errata page for this book, go to http://www.wrox.com and locate the title using the Search box or one of the title lists. Then, on the book details page, click the Book Errata link. On this page you can view all errata that has been submitted for this book and posted by Wrox editors.

note.eps
NOTE
A complete book list including links to each book’s errata is also available at www.wrox.com/misc-pages/booklist.shtml.

If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact/techsupport.shtml and complete the form there to send us the error you have found. We’ll check the information and, if appropriate, post a message to the book’s errata page and fix the problem in subsequent editions of the book.

p2p.wrox.com

For author and peer discussion, join the P2P forums at p2p.wrox.com. The forums are a web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums.

At http://p2p.wrox.com you will find a number of different forums that will help you not only as you read this book, but also as you develop your own applications. To join the forums, just follow these steps:

1. Go to p2p.wrox.com and click the Register link.
2. Read the terms of use and click Agree.
3. Complete the required information to join as well as any optional information you wish to provide and click Submit.
4. You will receive an e-mail with information describing how to verify your account and complete the joining process.
note.eps
NOTE
You can read messages in the forums without joining P2P but in order to post your own messages, you must join.

Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing.

For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specific to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.89