Chapter 3. Managing Your Workspace

If the behavior of R objects is compared to game rules, then the workspace can be compared to the playground. To play the game well, you need to familiarize yourself not only with the rules, but also with the playground. In this chapter, I will introduce to you some basic but important skills to manage your workspace. These skills include:

  • Using the working directory
  • Inspecting the working environment
  • Modifying global options
  • Managing the library of packages

R's working directory

An R session always starts in a directory, no matter whether it is launched as an R terminal or in RStudio. The directory in which R is running is called the working directory of the R session. When you access other files on your hard drive, you can use either absolute paths (for example, D:Workspaces est-projectdata2015.csv) in most cases or relative paths (for example, data2015.csv) with the right working directory (in this case, D:Workspaces est-project).

The use of relative paths to the working directory does not change the file paths, but the way you specify them is shorter. It also makes your scripts more portable. Imagine you are writing some R scripts to produce graphics according to a bunch of data files in a directory. If you write the directory as an absolute path, then anyone else who wants to run your script on their own computer would have to modify the paths in your code to the location of the data in their hard drives. However, if you write the directory as a relative path, then if the data is kept in the same relative location, the script will work without any modification.

In an R terminal, you can get the current working directory of the running R session using getwd(). By default, commandR starts a new R session from your user directory, and RStudio runs an R session in the background from your user documents directory.

Apart from the defaults, you can choose a directory and create an R project in RStudio. Then, every time you open that project, the working directory is the location of the project, which makes it super easy to access files in the project directory using relative paths, which improves the portability of the project.

Creating an R project in RStudio

To create a new project, simply go to File New Project or click the Project drop-down menu in the top-right corner of the main window and choose New Project. A window will appear, and you can create a new directory or choose an existing directory on your hard drive as the project directory:

Creating an R project in RStudio

Once you choose a local directory, the project will be created there. An R project is nothing but a .Rproj file that stores some settings. If you open such a project file in RStudio, the settings in it will be applied, and the working directory will be set to the directory in which the project file is located.

Another useful point in using RStudio to work in a project is that auto-completion makes writing file paths much more efficient. When you are typing a string of either an absolute or relative file path, press Tab and RStudio will list the files in that directory:

Creating an R project in RStudio

Comparing absolute and relative paths

Since I'm writing this book with RMarkdown in RStudio, the working directory is the directory of my book project:

getwd()
## [1] "D:/Workspaces/learn-r-programming"

You may notice that the working directory mentioned earlier uses / instead of . In Windows operating systems,  is the default path separator, but this symbol is already used to make special characters. For example, when you create a character vector, you can use  to represent a new line:

"Hello
World"
## [1] "Hello
World"

Special characters are preserved when the character vector is directly printed as a representation of the string. However, if you add cat() to it, the string will be written in the console with the escape characters translated to the characters they represent:

cat("Hello
World")
## Hello 
## World

The second word starts by a new line ( ) as normal. However, if  is so special, how should we write  itself? Just use \:

cat("The string with '' is translated")
## The string with '' is translated

That is why we should use \ or / in paths in Windows operating systems since both are supported. In Unix-like operating systems, such as macOS and Linux, things are easier: always use /. If you are using Windows and misuse  to refer to a file, an error will occur:

filename <- "d:data	est.csv" 
## Error: 'd' is an unrecognized escape in character string starting ""d:d"

Instead, you need to write it like this:

filename <- "d:\data\test.csv"

Fortunately, we can use / in Windows in most cases, which makes the same code runnable in nearly all popular operating systems using relative paths:

absolute_filename <- "d:/data/test.csv"
relative_filename <- "data/test.csv"

Instead of getting the working directory using getwd(), we can also set the working directory of the current R session using setwd(). However, this is almost always not recommended because it can direct all relative paths in a script to another directory and make everything go wrong.

Therefore, a good practice is to create an R project to start your work.

Managing project files

Once we create a project in RStudio, a .Rproj file is also created in the project directory in which there is no other file at the moment. Since R is related to statistical computing and data visualization, an R project mainly contains R scripts that do statistical computing (or other programming tasks), data files (such as .csv files), documents (such as Markdown files), and sometimes output graphics.

If different types of file are mixed up in the project directory, it will be increasingly more difficult to manage these project files, especially as input data accumulates or output data and graphics clutter the directory.

A recommended practice is to create subdirectories to contain different types of files resulting from different types of tasks.

For example, the following directory structure is plain, with all files together:

project/
- household.csv 
- population.csv 
- national-income.png 
- popluation-density.png 
- utils.R 
- import-data.R 
- check-data.R 
- plot.R 
- README.md 
- NOTES.md

By contrast, the following directory structure is much cleaner and nicer to work with:

project/ 
- data/ 
  - household.csv 
  - population.csv 
- graphics/ 
  - national-income.png 
  - popluation-density.png 
- R/ 
  - utils.R 
  - import-data.R 
  - check-data.R 
  - plot.R 
- README.md 
- NOTES.md

In the preceding directory structures, directories are represented in the form of directory/ and files in the form of file-name.ext. In most cases, the second structure is recommended because, as project needs and tasks become more complex, the first structure will end up in a mess while the second structure will remain tidy.

Apart from the structure issue, it is common to write the project introduction in README.md and put additional notes in NOTES.md. These two documents are Markdown documents (.md), and it is worth becoming familiar with its extremely simple syntax. Read Daring Fireball: Markdown Syntax Documentation (https://daringfireball.net/projects/markdown/syntax) and GitHub Help: Markdown Basics (https://help.github.com/articles/markdown-basics/) for details. We will cover the topic of combining R and Markdown in Chapter 15Boosting Productivity.

Now the working directory is ready. In the next section, you will learn various methods to inspect the working environment in an R session.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.41.229