Chapter 2. Working with Vectors and Time Series

In this chapter, we are going to cover the basic data structure in R—a vector. Understanding vectors is the foundation for all the subsequent chapters. You will learn how to perform efficient operations on numeric and logical vectors and how to create subsets. After this, you will learn how to write custom functions in order to expand and customize R's capabilities. Working with dates and time series and the use of graphical functions are introduced at the end of this chapter.

In this chapter, we'll cover the following topics:

  • Creating, saving, and examining the three main types of vectors
  • The principles of performing operations on vectors in R
  • Using functions that have more than one argument
  • Creating subsets of vectors
  • Dealing with missing values in vectors
  • Writing new functions
  • Working with dates
  • Displaying and saving graphical output

Vectors – the basic data structures in R

A vector is an ordered collection of values of the same type (or mode, in R terminology). As mentioned in the previous chapter, the three types of values that are useful for most purposes (including the topics of this book) are numeric, character, and logical. In this section, you are going to learn about several methods to create vectors, check the properties of interest for the given vectors, and perform operations involving pairs of vectors. You are also going to learn how to save the objects we create in the temporary computer memory via assignment.

Different types of vectors

Vectors are the most basic data structures in R since single elements (such as the number 10) are also represented in R by vectors (of length 1). As we have previously seen, when we enter a numeric value on the command line, it is printed on the screen. The number in square brackets to the left of the value is, in fact, the position of the leftmost element in the respective printed line. For example, the [1] part in the following output means the first (and only) printed element, 10, of the particular vector is at position 1:

> 10
[1] 10

Note

Entering a value on the command line is, in fact, a shortcut for the print function:

> print(10)
[1] 10

Vectors can be created from individual elements with the c function, which stands for combine. Let's take a look at the following examples:

> c(1,5,10,4)
[1]  1  5 10  4
> c("cat","dog","mouse","apple")
[1] "cat"   "dog"   "mouse" "apple"

Sequential numeric vectors can be easily created with the : operator. Such vectors have many uses in R. The : operator creates an ordered vector starting at the value to the left of the : symbol and ending at the value to the right of the : symbol, as follows:

> 7:20
[1]  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Or, when the first argument is larger than the second one:

> 33:24
[1] 33 32 31 30 29 28 27 26 25 24

A logical vector can also be created with the c function. Remember that TRUE and FALSE are special values and not characters. Therefore, these values should be typed without quotes:

> c(TRUE,FALSE,TRUE,TRUE)
[1]  TRUE FALSE  TRUE  TRUE

However, in practice, the creation of logical vectors is usually associated with employing a conditional operator on a vector rather than manually typing a sequence of logical values. We will elaborate on this later.

There are several functions that we can use to convert between vector types. The two most useful ones are as.numeric and as.character, which are used to convert a vector to a numeric or character vector, respectively. There are other functions to convert objects of a particular class into another, which we'll see in subsequent chapters. Let's take a look at the following examples:

> 33:24
[1] 33 32 31 30 29 28 27 26 25 24
> as.character(33:24)
[1] "33" "32" "31" "30" "29" "28" "27" "26" "25" "24"
> as.numeric(as.character(33:24))
[1] 33 32 31 30 29 28 27 26 25 24

A factor is a special type of encoding for a vector, where the vector has a defined set of acceptable values or levels. Such an encoding is most common in statistical uses of R, for example, when defining categorical variables to identify treatments in an experiment. Using factors is not essential for the purposes of this book. However, encountering factors is inevitable when working with R (for example, when reading a table from a file, character columns are encoded as factors by default), so at the very least, we need to be aware of this data structure.

The factor function can be used to convert a vector into a factor:

> factor(c("cat","dog","dog"))
[1] cat dog dog
Levels: cat dog
> factor(c("cat","dog","dog","mouse"))
[1] cat   dog   dog   mouse
Levels: cat dog mouse

As you can see, the acceptable levels of the resulting factor object (which are, by default, defined as a set of unique values that the vector has) are printed along with its values.

Using the assignment operator to save an object

So far we have used R by entering standalone expressions on the command line. As mentioned in the previous chapter, the returned objects are not saved anywhere this way. Therefore, we cannot make sequential operations with each created object serving as an input for the next step(s). However, saving intermediate result is essential to automate processes.

Saving objects to the temporary memory is called assignment. By temporary, we mean that the objects are deleted when we shut down the computer (or quit R), as opposed to writing to a file on the hard drive, where the information will permanently remain unless it's deleted. Assignment is performed by an assignment expression, which is composed of the object we would like to save, the assignment operator =, and the name we would like to give the new object. For example, we can save the 1:10 sequential vector to an object named v as follows:

> v = 1:10

We can then access our newly created object using its name the same way we accessed predefined objects (such as pi) in the previous chapter:

> v
[1]  1  2  3  4  5  6  7  8  9 10

Note

There is another assignment operator in R, namely <-:

> v <- 1:10

Throughout this book, the = operator is used since it is easier to type.

Also, note the difference between the assignment operator = and the equality conditional operator == (see the previous chapter). The = operator is used for assignment:

> one = 1
> two = 2
> one = two
> one
[1] 2
> two
[1] 2

The == operator is used to compare:

> one = 1
> two = 2
> one == two
[1] FALSE

When assigning an object with a name that already exists in memory, the older object is deleted and replaced by the new one:

> x = 55
> x
[1] 55
> x = "Hello"
> x
[1] "Hello"

The ls function returns a character vector with the names of all the user-defined objects (in a given environment, with the default one being the global R environment). For example, so far we have assigned four objects in memory:

> ls()
[1] "one" "two" "v"   "x"

Removing objects from memory

We can remove objects from memory by using the rm function. Let's take a look at the following examples:

> rm("v")
> ls()
[1] "one" "two" "x"

Tip

It is sometimes useful to remove all objects from memory. For example, if we want to run a given code section without worrying that the previously defined objects will interfere, this can be done by passing the whole list of objects currently in memory to the rm function as follows:

> rm(list = ls())
> ls()
character(0)

The character(0) output indicates an empty character vector.

Removing all objects can be achieved by navigating to Misc | Remove all objects (RGui) or Session | Clear workspace… (RStudio). The reason for writing the list=ls() part will become evident after reading the Using functions with several parameters section in this chapter.

Summarizing vector properties

Many functions in R are intended to work with vectors. The current section reviews some commonly used functions that are used to find out vectors' properties.

For example, we may be interested in the mean, minimal, and maximal values of a given vector. To get these, we can use the mean, min, and max functions, respectively:

> v = 1:10
> mean(v)
[1] 5.5
> min(v)
[1] 1
> max(v)
[1] 10

We can also get both the minimal and maximal values at once with the range function:

> range(v)
[1] 1 10

The length function returns the number of elements a given vector has:

> v = c(4,2,3,9,1)
> length(v)
[1] 5

With logical vectors, we sometimes would like to know whether they contain at least one TRUE value or whether all of their values are TRUE. This can be achieved with the any and all functions, respectively:

> l = c(TRUE, FALSE, FALSE, TRUE)
> any(l)
[1] TRUE
> all(l)
[1] FALSE

If we would like to know how many TRUE values a vector contains, we can utilize the default transformation from logical to numeric vectors when arithmetic functions are used on the former:

> l = c(TRUE, FALSE, FALSE, TRUE)
> sum(l)
[1] 2

In this example, each TRUE value was first converted to 1 and each FALSE value to 0. Therefore, the vector c(TRUE,FALSE,FALSE,TRUE) became the vector c(1,0,0,1) and the sum of this vector's elements is 2.

The which function returns the positions of all the TRUE elements within a logical vector:

> which(l)
[1] 1 4

Here, a vector of length 2 was returned since there are two TRUE values in the vector l. The two values of this vector are 1 and 4 since the first TRUE value occupies the first position in the vector l, while the second TRUE value occupies the fourth position.

Note

The related functions which.min and which.max return the position of the minimal or maximal element in a numeric vector:

> which.min(v)
[1] 5
> which.max(v)
[1] 4

We are going to see another example with which.min later in this book.

The last useful function we will mention is the unique function, which returns the unique elements of a vector; that is, it returns a set of elements the vector consists of without repetitions. Let's take a look at the following examples:

> v = c(5,6,2,2,3,0,-1,2,5,6)
> unique(v)
[1]  5  6  2  3  0 -1

Element-by-element operations on vectors

As opposed to functions that treat the vector as a single entity (as seen in the previous section), some functions work on each element of the vector as if it was a separate entity and return a vector of the results (which, therefore, has the same number of elements as the input vector). In fact, all arithmetic and logical operators work this way, as shown in the following examples (we did not have a chance to witness this since we always used vectors of length 1):

> 1:10 * 2
 [1]  2  4  6  8 10 12 14 16 18 20
> 1:10 - 10
 [1] -9 -8 -7 -6 -5 -4 -3 -2 -1  0
> sqrt(c(4,16,64))
[1] 2 4 8

In the first expression, we multiplied the vector (1, 2, ..., 10) by 2, which resulted in a vector of 10 elements where the first element is equal to 1*2, the second is equal to 2*2, the third is equal to 3*2, and so on, up to 10*2.

Logical operators function in the same way, shown as follows:

> x = 1:5
> x
[1] 1 2 3 4 5
> x >= 3
[1] FALSE FALSE  TRUE  TRUE  TRUE

Here, for each of the values 1, 2, 3, 4, 5, it has been evaluated whether the value is larger or equal to 3, giving FALSE for 1 and 2 and TRUE for 3, 4, and 5.

If we want to check whether a given value from one vector is present in another, we can use the %in% operator. With %in%, we basically ask whether the value(s) of a vector on the left match any of the values of a vector on the right:

> 1 %in% 1:10
[1] TRUE
> 11 %in% 1:10
[1] FALSE

For these simple examples, we can do without the %in% operator (see the following examples). Its utility will become apparent towards the end of this chapter, when we want, for instance, to look for each element of a long vector A and check whether it has a match in a long vector B. Here are the alternatives to the preceding expressions:

> any(1:10 == 1)
[1] TRUE
> any(1:10 == 11)
[1] FALSE

In these two examples, we encompass the logical operation within the any function to check whether the resulting logical vector has at least one TRUE value.

Now, let's move on to character vectors. When working with character values, the paste function can be useful to combine separate elements into a single character string. The sep parameter of this function determines which characters will be used to separate the values (a single space is the default). Let's take a look at the following example:

> paste("There are", "5", "books.")
[1] "There are 5 books."

The paste function also works with vectors that have more than one element:

> paste("Image", 1:5)
[1] "Image 1" "Image 2" "Image 3" "Image 4" "Image 5"

Note that the paste function automatically converts numeric values into characters if characters are supplied:

> x = 80
> paste("There are", x, "books.")
[1] "There are 80 books."

Note

The paste0 function does the same thing as paste, with the default value for the sep parameter being nothing:

> paste(1, 2, 3, sep = "")
[1] "123"
> paste0(1, 2, 3)
[1] "123"

The recycling principle

In the previous chapter, we only used operators on two vectors of length 1. In this chapter, so far, we have used operations involving one vector of length 1 and another of length >1. What happens when we perform an operation involving two vectors of length >1?

If we have two vectors of exactly the same length, the operation is performed on each consecutive pair of elements taken from the two vectors, as follows:

> c(1,2,3) * c(10,20,30)
[1] 10 40 90

In this example, 1 is multiplied by 10, 2 is multiplied by 20, and 3 is multiplied by 30, and the three results are combined into a single vector of length 3.

In case when the lengths of the two vectors are unequal, the shorter vector is recycled before the operation is performed. In other words, values at the beginning of the shorter vector are attached to its end, sequentially and as many times as necessary, until the lengths of both vectors match. The simplest case, which we witnessed in the previous section, is the one that involves one vector of length 1 and another vector of length greater than 1. We can describe what happens in such a case as the recycling of the vector that has one element until it matches the length of the longer vector. For example, when executing the first of these two expressions, it is as if we are performing the second:

> 1:4 * 3
[1]  3  6  9 12
> 1:4 * c(3,3,3,3)
[1]  3  6  9 12

The same way, in the following example, the vector c(3,5) is recycled until it is of length 4, to c(3,5,3,5). The result is c(1,2,3,4) multiplied by c(3,5,3,5):

> c(1,2,3,4) * c(3,5)
[1]  3 10  9 20

When the length of the longer vector is not a multiple of the shorter vector, recycling is incomplete and we receive a warning message. Nevertheless, the operation is carried out. In the next example, the vector c(1,10,100) is of length 3, while the vector 1:5 is of length 5. The vector c(1,10,100) is recycled to c(1,10,100,1,10), which is the same length as the vector c(1,2,3,4,5), as follows:

> 1:5 * c(1,10,100)
[1]   1  20 300   4  50
Warning message:
In 1:5 * c(1, 10, 100) :
  longer object length is not a multiple of shorter object length
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.245.1