Chapter 5. Working with Basic Objects

In the previous chapters, you learned how to create several basic types of objects, including atomic vectors, lists, and data frames to store data. You learned how to create functions to store logic. Given these building blocks of R script, you learned about different types of expressions to control the flow of logic involving basic objects. Now, we are getting familiar with the basic grammar and syntax of the R programming language. It's time to build a vocabulary of R using built-in functions to work with basic objects.

The real power of R lies in the enormous amount of functions it provides. Getting to know a variety of basic functions is extremely useful, and it will save you time and boost your productivity.

Although R is mainly a statistical computing environment, many basic functions are not related to any statistics but to more fundamental tasks such as inspecting the environment, converting texts to numbers, and performing logical operations.

In this chapter, you will get to know a wide range of basic yet most useful functions in R, including:

  • Object functions
  • Logical functions
  • Math functions
  • Numeric methods
  • Statistical functions
  • Apply-family functions

Using object functions

In the previous chapter, you learned about some functions that work with the environment and packages. In this section, we will get to know some basic functions that deal with objects in general. More specifically, I will introduce you to more functions to access the type and dimensions of a data object. You will get an impression of how these concepts can be combined and how they work together.

Testing object types

Although everything in R is an object, objects have different types.

Suppose the object we are dealing with is user-defined. We will create a function that behaves in different ways according to the type of the input object. For example, we need to create a function named take_it that returns the first element if the input object as an atomic vector (for example, numeric vector, character vector, or logical vector), but returns a user-defined element if the input object is a list of data and index.

For example, if the input is a numeric vector such as c(1, 2, 3), then the function should return its first element 1. If the input is a character vector such as c("a", "b", "c"), then the function should return a. However, if the input is a list list(data = c("a", "b", "c"), index = 3), then the function should return the third element (index = 3) of data, that is, c.

To create such a function, we can imagine the functions and logic flow that might appear in it. First, as the output of the function depends on the type of input, we need to use one of the is.* functions to tell whether the input is of a certain type. Second, as the function behaves differently due to the type of input, we need to use conditional expressions such as if else to branch the logic. Finally, if the function basically takes out an element from the input, we need to use an element-extraction operator. Now, the implementation of the function becomes pretty clear:

take_it <- function(x) {
  if (is.atomic(x)) {
    x[[1]]
  } else if (is.list(x)) {
    x$data[[x$index]]
  } else {
    stop("Not supported input type")
  }
} 

The preceding function behaves differently as x takes different types. When x takes an atomic vector (for example, a numeric vector), the function extracts its first element. When x takes a list of data and index, the function extracts the element with the index of index from x$data:

take_it(c(1, 2, 3))
## [1] 1
take_it(list(data = c("a", "b", "c"), index = 3))
## [1] "c" 

For unsupported input types, the function is supposed to stop with an error message rather than return any value. For example, take_it cannot handle the function input. Note that we can pass any function around to other functions as an argument, just like any other object. However, in this case, if mean as a function is passed to it, then it will turn to the else condition and stop:

take_it(mean)
## Error in take_it(mean): Not supported input type 

What if the input is indeed a list but does not contain any of the expected elements, data and index? Just do an experiment with a list of input (instead of data), without any index element:

take_it(list(input = c("a", "b", "c")))
## NULL 

It might surprise you that the function does not produce an error. The output is NULL because x$data is NULL and extracting any value from NULL is also NULL:

NULL[[1]]
## NULL
NULL[[NULL]]
## NULL 

However, if the list only contains data but misses index, the function will end up in an error:

take_it(list(data = c("a", "b", "c")))
## Error in x$data[[x$index]]: attempt to select less than one element 

The error occurs because x$index turns out to be NULL, and extracting value from a vector by NULL produces an error:

c("a", "b", "c")[[NULL]]
## Error in c("a", "b", "c")[[NULL]]: attempt to select less than one element 

The third possibility is a bit similar to the first case in which NULL[[2]] returns NULL:

take_it(list(index = 2))
## NULL 

From the earlier exceptions, it is normal to see that the error message is not so informative if you are not very familiar with these edge cases in which NULL is involved in the computation. For more complicated cases, if those errors do happen, you probably won't be able to find out the exact causes in a short period of time. One good solution is to check the input yourself in the implementation of the function and reflect the assumptions made to the arguments.

To handle the preceding cases of misuse, the following implementation takes into account whether the type of each argument is desired:

take_it2 <- function(x) {
  if (is.atomic(x)) {
    x[[1]]
  } else if (is.list(x)) {
    if (!is.null(x$data) && is.atomic(x$data)) {
      if (is.numeric(x$index) && length(x) == 1) {
        x$data[[x$index]]
      } else {
        stop("Invalid index")
      }
    } else {
      stop("Invalid data")
    }
  } else {
    stop("Not supported input type")
  }
} 

For the case where x is a list, we check whether x$data is not null and is an atomic vector. If so, then we check if x$index is properly specified as a single-element numeric vector, or a scalar. If any of the conditions is violated, the function stops with an informative error message telling the user what is wrong with the input.

There are also quirky behaviors of the built-in checker functions. For example, is.atomic(NULL) returns TRUE. Therefore, if list x does not contain an element called data, the positive branch of if (is.atomic(x$data)) can still be triggered, which also leads to NULL. With some argument checking, the code is now more robust and can produce more informative error messages when the assumptions are violated:

take_it2(list(data = c("a", "b", "c")))
## Error in take_it2(list(data = c("a", "b", "c"))): Invalid index
take_it2(list(index = 2))
## Error in take_it2(list(index = 2)): Invalid data 

Another possible implementation of this function is using the S3 dispatch, which will be covered in a later chapter on object-oriented programming.

Accessing object classes and types

Apart from using is.* functions, we can also use class() or typeof() to implement this function. Before directly accessing the type of an object, it is useful to know how these two functions differ from each other.

The following examples demonstrate the difference between the output of class() and typeof() when they are called upon different types of objects.

For each object xclass() and typeof() are called and then str() is called to show its structure.

For a numeric vector:

x <- c(1, 2, 3)
class(x)
## [1] "numeric"
typeof(x)
## [1] "double"
str(x)
##  num [1:3] 1 2 3 

For an integer vector:

x <- 1:3
class(x)
## [1] "integer"
typeof(x)
## [1] "integer"
str(x)
##  int [1:3] 1 2 3 

For a character vector:

x <- c("a", "b", "c")
class(x)
## [1] "character"
typeof(x)
## [1] "character"
str(x)
##  chr [1:3] "a" "b" "c" 

For a list:

x <- list(a = c(1, 2), b = c(TRUE, FALSE))
class(x)
## [1] "list"
typeof(x)
## [1] "list"
str(x)
## List of 2
## $ a: num [1:2] 1 2
##  $ b: logi [1:2] TRUE FALSE 

For a data frame:

x <- data.frame(a = c(1, 2), b = c(TRUE, FALSE))
class(x)
## [1] "data.frame"
typeof(x)
## [1] "list"
str(x)
## 'data.frame': 2 obs. of 2 variables:
## $ a: num 1 2
##  $ b: logi  TRUE FALSE 

We can see that typeof() returns the low-level internal type of an object, while class() returns the high-level class of an object. One contrast we have mentioned before is that data.frame is in essence a list with equal-length list elements. Therefore, a data frame has the class of data.frame for data frame related functions to recognize, but typeof() still tells it is a list internally.

The topic is related to the S3 object-oriented programming mechanism and will be covered in detail in a later chapter. However, it is still useful to mention the difference between class() and typeof() here.

From the preceding output, it is also clear that str(), which we introduced in the previous chapter, shows the structure of an object. For vectors in the object, it usually shows their internal type (typeof()).

Accessing data dimensions

Matrices, arrays, and data frames have the property of dimensions in addition to classes and types.

Getting data dimensions

In R, a vector is by construction a one-dimensional data structure:

vec <- c(1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 6)
class(vec)
## [1] "numeric"
typeof(vec)
## [1] "double" 

The same underlying data can be represented with more dimensions, which can be accessed via dim()nrow(), or ncol():

sample_matrix <- matrix(vec, ncol = 4)
sample_matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 2 3 4 5
## [3,] 3 4 5 6
class(sample_matrix)
## [1] "matrix"
typeof(sample_matrix)
## [1] "double"
dim(sample_matrix)
## [1] 3 4
nrow(sample_matrix)
## [1] 3
ncol(sample_matrix)
## [1] 4 

The first preceding expression creates a four-column matrix from numeric vector vec. The matrix has the class of matrix, while typoef() preserves double from vec. Since a matrix is a dimensional data structure, dim() shows its dimensions in vector form. The nrow() and ncol() functions are shortcuts to access its number of rows and columns. If you read the source code of these two shortcuts, you will find that they are nothing special, but they return the first and second elements of dim() of the same input, respectively.

Higher dimensional data is usually represented by an array. For example, the same data vec can also be represented in three dimensions, that is, to access one element, you need to specify three positions in the three dimensions in turn:

sample_array <- array(vec, dim = c(2, 3, 2))
sample_array
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 3 3
## [2,] 2 2 4
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 3 5 5
## [2,] 4 4 6
class(sample_array)
## [1] "array"
typeof(sample_array)
## [1] "double"
dim(sample_array)
## [1] 2 3 2
nrow(sample_array)
## [1] 2
ncol(sample_array)
## [1] 3 

Similar to matrix, an array has a class of array but still preserves the type of the underlying data. The length of the output of dim() is the number of dimensions needed to represent the data.

Another data structure that has a notion of dimensions is a data frame. However, a data frame is fundamentally different from a matrix. A matrix is derived from a vector but adds a dimensional property. On the other hand, a data frame is derived from a list but adds a constraint that each list element must have the same length:

sample_data_frame <- data.frame(a = c(1, 2, 3), b = c(2, 3, 4))
class(sample_data_frame)
## [1] "data.frame"
typeof(sample_data_frame)
## [1] "list"
dim(sample_data_frame)
## [1] 3 2
nrow(sample_data_frame)
## [1] 3
ncol(sample_data_frame)
## [1] 2 

However, dim()nrow(), and ncol() are still useful for data frames.

Reshaping data structures

The syntax of dim(x) <- y means change the value of dimensions of x to y.

For a plain vector, the expression converts the vector to a matrix with the specified dimensions:

sample_data <- vec
dim(sample_data) <- c(3, 4)
sample_data
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 2 3 4 5
## [3,] 3 4 5 6
class(sample_data)
## [1] "matrix"
typeof(sample_data)
## [1] "double" 

You can see that the class of the object changes from numeric to matrix, and the type of the object remains unchanged.

For a matrix, the expression reshapes the matrix:

dim(sample_data) <- c(4, 3)
sample_data
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 4
## [3,] 3 3 5
## [4,]    2    4    6 

It is useful to understand that changing the dimension of a vector, matrix, or array only alters the representation and accessing methods of the object and does not change the underlying data stored in memory. Therefore, it should be no surprise that a matrix is reshaped to an array as follows:

dim(sample_data) <- c(3, 2, 2)
sample_data
## , , 1
##
## [,1] [,2]
## [1,] 1 2
## [2,] 2 3
## [3,] 3 4
##
## , , 2
##
## [,1] [,2]
## [1,] 3 4
## [2,] 4 5
## [3,] 5 6
class(sample_data)
## [1] "array" 

It should be obvious that dim(x) <- y works only if prod(y) equals length(x), that is, the product of all dimensions must be equal to the length of the data elements. Otherwise, an error will occur:

dim(sample_data) <- c(2, 3, 4)
## Error in dim(sample_data) <- c(2, 3, 4): dims [product 24] do not match the length of object [12] 

Iterating over one dimension

A data frame is often a collection of records, and each row represents a record. It is common to iterate over all records stored in a data frame. Let's look at the following data frame:

sample_data_frame
## a b
## 1 1 2
## 2 2 3
## 3 3 4 

For this data frame, we can iterate over the rows by printing the values of the variables using a for loop over 1:nrow(x):

for (i in 1:nrow(sample_data_frame)) {
  # sample text:
  # row #1, a: 1, b: 2
  cat("row #", i, ", ",
    "a: ", sample_data_frame[i, "a"],
    ", b: ", sample_data_frame[i, "b"],
    "
", sep = "")
}
## row #1, a: 1, b: 2
## row #2, a: 2, b: 3
## row #3, a: 3, b: 4 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.159.223