Creating subsets of vectors

Creating subsets of data is one of the fundamental operations in data analysis. In this section, we will cover the two basic ways to create subsets of a vector. The first way involves numeric vectors, which specify the requested indices to be included in the subset. The second way involves using logical vectors, which specify for each element whether we would like to keep it or not.

Subsetting with numeric vectors of indices

Subsetting using numeric vectors of indices is done using the square brackets operator [, by providing the vector of indices within the square brackets. For example, we can select a single element of a vector by putting the value of the required index within brackets, as follows:

> x = c(5,6,1,2,3,7)
> x[3]
[1] 1
> x[1]
[1] 5
> x[6]
[1] 7

If we would like to, for example, find out the value of the last element in a given vector, we can use the length function, which returns its length (the index of the vectors' last element), as follows:

> x[length(x)]
[1] 7

We can also assign new values to a subset of a vector, as follows:

> x = 1:3
> x
[1] 1 2 3
> x[2] = 300
> x
[1]   1 300   3

We can create a subset that is more than one element long, when the length of our vector of indices is larger than 1:

> x = c(43,85,10)
> x[1:2]
[1] 43 85
> x[c(3,1)]
[1] 10 43

As seen in the last expression, the indices vector, which we placed in the square brackets, does not need to be composed of consecutive values, nor do its values need to have an increasing order. For example, we can reverse the order of values in a vector by using a vector of indices going from the position of the last element down to 1:

> x = 33:24
> x
 [1] 33 32 31 30 29 28 27 26 25 24
> x[length(x):1]
[1] 24 25 26 27 28 29 30 31 32 33

The vector of indices can also include repetitive values, as follows:

> x = c(43,85,10)
> x[rep(3,4)]
[1] 10 10 10 10

In this example, the rep(3,4) expression creates the vector c(3,3,3,3). The latter then results in the creation of a subset (which is longer than the original vector), where the third element of the vector is repeated four times.

The recycling rule also applies to assignment into subsets:

> x = 1:10
> x[3:8] = c(15,16)
> x
[1]  1  2 15 16 15 16 15 16  9 10

In this example, the values 15 and 16 were alternated until the six-element long subset in the vector x is filled.

Subsetting with logical vectors

Another method to create a subset of a vector is by supplying a logical vector within the [ operator. The logical vector points out to the elements that need to be kept within the subset; the elements to be kept are those whose indices match the indices of the TRUE values in the logical vector. It is frequently useful to create the logical vector that is used for subsetting by applying a conditional operator on the same vector we wish to subset. Let's take a look at the following example:

> x = seq(85, 100, 2)
> x
[1] 85 87 89 91 93 95 97 99
> x > 90
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
> x[x > 90]
[1] 91 93 95 97 99

Here, we created a logical vector x>90, which like the vector x has eight elements (since the operation was carried out element by element as we saw previously). The values in this vector are either TRUE or FALSE depending on whether the vector x has a value larger than 90 at the respective position. When we create a subset of the vector x using the logical vector x>90, we get a vector containing those five values in x that occupy the same position that the TRUE values occupy in the x>90 vector. These are the positions where the values of x are greater than 90.

We can even apply more complex conditions to select some very specific values:

> x
[1] 85 87 89 91 93 95 97 99
> x[x>85 & x<90]
[1] 87 89
> x[x>92 | x<86]
[1] 85 93 95 97 99

Note that when subsetting with logical vectors, the order of values in the subset matches their order in the original vector, since the first element in the subset will be the first element that has TRUE in the logical vector, the second will be the second element that has TRUE in the logical vector, and so on.

If none of the elements satisfies the required condition (which results in the logical vector having all FALSE values), we will get an empty vector as a result. For example, no values in the vector x (or in any other vector) are larger as well as smaller than 90 at the same time:

> x>90 & x<90
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> x[x>90 & x<90]
numeric(0)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.60.63