The common steps to all R analyses

While retracing the development process behind our fire attack strategy, we encountered a key series of steps that are common to every analysis that you will conduct in R. Regardless of the exact situation or the statistical techniques used, there are certain things that must be done to yield an organized and thorough R analysis. Each of these steps is detailed.

Perhaps it goes without saying that the thing to do before beginning any R analysis is to launch R itself. Nevertheless, it is mentioned here for completeness and transparency.

Step 1: Set your working directory

Once R is launched, the first common step is to set your working directory. This can be done using the setwd(dir) function and subsequently verified using the getwd() command:

> #Step 1: set your working directory
> #set your working directory using setwd(dir)
> #replace the sample location with one that is relevant to you
> setwd("/Users/johnmquick/rBeginnersGuide/")
> #once set, you can verify your new working directory using getwd()
> getwd()
[1] "/Users/johnmquick/rBeginnersGuide/"

Comment your work

Note that commented lines, which are prefixed with the pound sign (#), appeared before each of our functions in step one. It is vital that you comment all of the actions that you take within the R console. This allows you to refer back to your work later and also makes your code accessible to others.

Note

This is an opportune time to point out that you can draft your code in other places besides the R console. For example, R has a built in editor that can be opened by going to the File | New Document/Script menu or simultaneously pressing the Command + N or Ctrl + N keys. Other free editors can also be found online. The advantages of using an editor are that you can easily modify your code and see different types of code in different colors, which helps you to verify that it is properly constructed. Note however, that to execute your code, it must be placed in the R console.

Step 2: Import your data (or load an existing workspace)

After you set the working directory, it is time to pull your data into R. This can be achieved by creating a new variable in tandem with the read.csv(file) command:

> #Step 2: Import data (or load an existing workspace)
> #read a dataset from a csv file into R using read.csv(file) and save it into a new variable
> dataset <- read.csv("datafile.csv")

Alternatively, if you were continuing a prior data analysis, rather than starting a new one, you would instead load a previously saved workspace using load.image(file). You can then verify the contents of your loaded workspace using the ls() command.

> #load an existing workspace using load.image(file)
> load.image("existingWorkspace.RData")
> #verify the contents of your workspace using ls()
> ls()
[1] "myVariable 1"
[2] "myVariable 2"
[3] "myVariable 3"

Step 3: Explore your data

Regardless of the type or amount of data that you have, summary statistics should be generated to explore your data. Summary statistics provide you with a general overview of your data and can reveal overarching patterns, trends, and tendencies across a dataset. Summary statistics include calculations such as means, standard deviations, and ranges, amongst others:

> #Step 3: Explore your data
> #calculate a mean using mean(data)
> mean(myData)
[1] 1000
> #calculate a standard deviation using sd(data)
> sd(myData)
[1] 100
> #calculate a range (minimum and maximum) using range(data)
> range(myData)
> [1] 500 2000

Also recall R's summary(object) function, which provides summary statistics along with additional vital information. It can be used with almost any object in R and will offer information specifically catered to that object:

> #generate a detailed summary for a given object using summary(object)
> summary(object)

Note

Note that there are often other ways to make an initial examination of your data in addition to using summary statistics. When appropriate, graphing your data is an excellent way to gain a visual perspective on what it has to say (data visualization is the primary topic of Chapter 8 and Chapter 9 of this book). Furthermore, before conducting an analysis, you will want to ensure that your data are consistent with the assumptions necessitated by your statistical methods. This will prevent you from expending energy on inappropriate techniques and from making invalid conclusions.

Step 4: Conduct your analysis

Here is where your work will differ from project to project. Depending on the type of analysis that you are conducting, you will use a variety of different techniques. For example, in this book we have primarily used regression analysis. Regression is but one of an endless number of potential methods. The correct techniques to use will be determined by the circumstances surrounding your work.

> #Step 4: Conduct your analysis
> #The appropriate methods for this step will vary between analyses.

Step 5: Save your workspace and console files

At the conclusion of your analysis, you will always want to save your work. To have the option to revisit and manipulate your R objects from session to session, you will need to save your R workspace using the save.image(file) command, as follows:

> #Step 5: Save your workspace and console files
> #save your R workspace using save.image(file)
> #remember to include the .RData file extension
> save.image("myWorkspace.RData")

To save your R console text, which contains the log of every action that you took during a given session, you will need to copy and paste it into a text file. Once copied, the console text can be formatted to improve its readability. For instance, a text file containing the five common steps of every R analysis could take the following form:

> #There are five steps that are common to every data analysis conducted in R
> #Step 1: set your working directory
> #set your working directory using setwd(dir)
> #replace the sample location with one that is relevant to you
> setwd("/Users/johnmquick/rBeginnersGuide/")
> #once set, you can verify your new working directory using getwd()
> getwd()
[1] "/Users/johnmquick/rBeginnersGuide/"
> #Step 2: Import data (or load an existing workspace)
> #read a dataset from a csv file into R using read.csv(file) and save it into a new variable
> dataset <- read.csv("datafile.csv")
> #OR
> #load an existing workspace using load.image(file)
> load.image("existingWorkspace.RData")
> #verify the contents of your workspace using ls()
> ls()
[1] "myVariable 1"
[2] "myVariable 2"
[3] "myVariable 3"
> #Step 3: Explore your data
> #calculate a mean using mean(data)
> mean(myData)
[1] 1000
> #calculate a standard deviation using sd(data)
> sd(myData)
[1] 100
> #calculate a range (minimum and maximum) using range(data)
> range(myData)
> [1] 500 2000
> #generate a detailed summary for a given object using summary(object)
R workspacesaving> summary(object)
> #Step 4: Conduct your analysis
> #The appropriate methods for this step will vary between analyses.
> #Step 5: Save your workspace and console files
> #save your R workspace using save.image(file)
> #remember to include the .RData file extension
> save.image("myWorkspace.RData")
> #save your R console text by copying it and pasting it into a text file.

Note

See the rBeginnersGuide_CommonSteps.txt file that is provided with this book.

Pop quiz

  1. Which of the following is not a benefit of commenting your code?

    a. It makes your code readable and organized.

    b. It makes your code accessible to others.

    c. It makes it easier for you to return to and recall your past work.

    d. It makes the analysis process faster.

Have a go hero

Conduct a complete end to end analysis using the strategy that you decided upon at the conclusion of Chapter 6. Be sure to employ each of the five common steps to all R analyses. Along the way, refer to the Retracing and Refining a Complete Analysis section of this chapter, as well as the previous chapters of this book. Once your analysis is complete, you should have the following items:

  • A workspace file containing all of the objects used in your analysis
  • A commented console text file detailing all of the actions that occurred during your analysis
  • A sound, viable battle strategy for the Shu army
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.3.72