8
Lists

This chapter covers an additional R data type called a list. Lists are somewhat similar to vectors, but can store more types of data and usually include more details about that data (with some cost). Lists are R’s version of a map, which is a common and extremely useful way of organizing data in a computer program. Moreover, lists are used to create data frames, which are the primary data storage type used for working with sets of real data in R. This chapter covers how to create and access elements in a list, as well as how to apply functions to lists.

8.1 What Is a List?

A list is a lot like a vector, in that it is a one-dimensional collection of data. However, unlike a vector, you can store elements of different types in a list; for example, a list can contain numeric data and character string data. Lists can also contain more complex data types—including vectors and even other lists!

Elements in a list can also be tagged with names that you can use to easily refer to them. For example, rather than talking about the list’s “element #1,” you can talk about the list’s “first_name element.” This feature allows you to use lists to create a type of map. In computer programming, a map (or “mapping”) is a way of associating one value with another. The most common real-world example of a map is a dictionary or encyclopedia. A dictionary associates each word with its definition: you can “look up” a definition by using the word itself, rather than needing to look up the 3891st definition in the book. In fact, this same data structure is called a dictionary in the Python programming language!

Caution

The definition of a list in the R language is distinct from how some other languages use the term “list.” When you begin to explore other languages, don’t assume that the same terminology implies the same capabilities.

As a result, lists are extremely useful for organizing data. They allow you to group together data like a person’s name (characters), job title (characters), salary (number), and whether the person is a member of a union (logical)—and you don’t have to remember whether the person’s name or title was the first element!

Remember

If you want to label elements in a collection, use a list. While vector elements can also be tagged with names, that practice is somewhat uncommon and requires a more verbose syntax for accessing the elements.

8.2 Creating Lists

You create a list by using the list() function and passing it any number of arguments (separated by commas) that you want to make up that list—similar to the c() function for vectors.

However, you can (and should) specify the tags for each element in the list by putting the name of the tag (which is like a variable name), followed by an equals symbol (=), followed by the value you want to go in the list and be associated with that tag. This is similar to how named arguments are specified for functions (see Section 6.2.1). For example:

# Create a `person` variable storing information about someone
# Code is shown on multiple lines for readability (which is valid R code!)
person <- list(
  first_name = "Ada",
  job = "Programmer",
  salary = 78000,
  in_union = TRUE
)

This creates a list of four elements: "Ada", which is tagged with first_name; "Programmer", which is tagged with job; 78000, which is tagged with salary; and TRUE, which is tagged with in_union.

Remember

You can have vectors as elements of a list. In fact, each scalar value in the preceding example is really a vector (of length 1).

It is possible to create a list without tagging the elements:

# Create a list without tagged elements. NOT the suggested usage.
person_alt <- list("Ada", "Programmer", 78000, TRUE)

However, tags make it easier and less error-prone to access specific elements. In addition, tags help other programmers read and understand the code—tags let them know what each element in the list represents, similar to an informative variable name. Thus it is recommended to always tag lists you create.

Tip

You can get a vector of the names of your list items using the names() function. This is useful for understanding the structure of variables that may have come from other data sources.

Because lists can store elements of different types, they can store values that are lists themselves. For example, consider adding a list of favorite items to the person list in the previous example:

# Create a `person` list that has a list of favorite items
person <- list(
  first_name = "Ada",
  job = "Programmer",
  salary = 78000,
  in_union = TRUE,
  favorites = list(
    music = "jazz",
    food = "pizza"
  )
)

This data structure (a list of lists) is a common way to represent data that is typically stored in JavaScript Object Notation (JSON). For more information on working with JSON data, see Chapter 14.

8.3 Accessing List Elements

Once you store information in a list, you will likely want to retrieve or reference that information in the future. Consider the output of printing the person list, as shown in Figure 8.1. Notice that the output includes each tag name prepended with a dollar sign ($) symbol, and then on the following line prints the element itself.

A screenshot shows creating and printing the person list in the RStudio console.
Figure 8.1 Creating and printing a list element in RStudio.

Because list elements are (usually) tagged, you can access them by their tag name rather than by the index number you used with vectors. You do this by using dollar notation: refer to the element with a particular tag in a list by writing the name of the list, followed by a $, followed by the element’s tag (a syntax unavailable to named vectors):

# Create the `person` list
person <- list(
  first_name = "Ada",
  job = "Programmer",
  salary = 78000,
  in_union = TRUE
)

# Reference specific tags in the `person` list
person$first_name # [1] "Ada"
person$salary     # [1] 78000

You can almost read the dollar sign as if it were an “apostrophe s” (possessive) in English. Thus, person$salary would mean “the person list’s salary value.”

Regardless of whether a list element has a tag, you can also access it by its numeric index (i.e., if it is the first, second, and so on item in the list). You do this by using double-bracket notation. With this notation, you refer to the element at a particular index of a list by writing the name of the list, followed by double square brackets ([[]]) that contain the index of interest:

# This is a list (not a vector!), even though elements have the same type
animals <- list("Aardvark", "Baboon", "Camel")

animals[[1]] # [1] "Aardvark"
animals[[3]] # [1] "Camel"
animals[[4]] # Error: subscript out of bounds!

You can also use double-bracket notation to access an element by its tag if you put a character string of the tag name inside the brackets. This is particularly useful in cases when the tag name is stored in a variable:

# Create the `person` list with an additional `last_name` attribute
person <- list(
  first_name = "Ada",
  last_name = "Gomez",
  job = "Programmer",
  salary = 78000,
  in_union = TRUE
)

# Retrieve values stored in list elements using strings
person[["first_name"]] # [1] "Ada"
person[["salary"]]     # [1] 78000

# Retrieve values stored in list elements
# using strings that are stored in variables
name_to_use <- "last_name"  # choose name (i.e., based on formality)
person[[name_to_use]]       # [1] "Gomez"
name_to_use <- "first_name" # change name to use
person[[name_to_use]]       # [1] "Ada"

# You can use also indices for tagged elements
# (but they're difficult to keep track of)
person[[1]] # [1] "Ada"
person[[5]] # [1] TRUE

Remember that lists can contain complex values (including other lists). Accessing these elements with either dollar or double-bracket notation will return that “nested” list, allowing you to access its elements:

# Create a list that stores a vector and a list. `job_post` has
# a *list* of qualifications and a *vector* of responsibilities.
job_post <- list(
  qualifications = list(
    experience = "5 years",
    bachelors_degree = TRUE
  ),
  responsibilities = c("Team Management", "Data Analysis", "Visualization")
)

# Extract the `qualifications` elements (a list) and store it in a variable
job_qualifications <- job_post$qualifications

# Because `job_qualifications` is a list, you can access its elements
job_qualifications$experience # "5 years"

In this example, job_qualifications is a variable that refers to a list, so its elements can be accessed via dollar notation. But as with any operator or function, it is also possible to use dollar notation on an anonymous value (e.g., a literal value that has not been assigned to a variable). That is, because job_post$qualifications is a list, you can use bracket or dollar notation to refer to an element of that list without assigning it to a variable first:

# Access the `qualifications` list's `experience` element
job_post$qualifications$experience # "5 years"

# Access the `responsibilities` vector's first element
# Remember, `job_post$responsibilities` is a vector!
job_post$responsibilities[1] # "Team Management"

This example of “chaining” together dollar-sign operators allows you to directly access elements in lists with a complex structure: you can use a single expression to refer to the “job-post’s qualification’s experience” value.

8.4 Modifying Lists

As with vectors, you can add and modify list elements. List elements can be modified by assigning a new value to an existing list element. New elements can be added by assigning a value to a new tag (or index). Moreover, list elements can be removed by reassigning the value NULL to an existing list element. All of these operations are demonstrated in the following example:

# Create the `person` list
person <- list(
  first_name = "Ada",
  job = "Programmer",
  salary = 78000,
  in_union = TRUE
)

# There is currently no `age` element (it's NULL)
person$age # NULL

# Assign a value to the (new) `age` tag
person$age <- 40
person$age # [1] 40

# Reassign a value to list's `job` element
person$job <- "Senior Programmer" # a promotion!
print(person$job)
# [1] "Senior Programmer"

# Reassign a value to the `salary` element (using the current value!)
person$salary <- person$salary * 1.15 # a 15% raise!
print(person$salary)
# [1] 89700

# Remove the `first_name` tag to make the person anonymous
person$first_name <- NULL

NULL is a special value that means “undefined” (note that it is a special value NULL, not the character string "NULL"). NULL is somewhat similar to the term NA—the difference is that NA is used to refer to a value that is missing (such as an empty element in a vector)—that is, a “hole.” Conversely, NULL is used to refer to a value that is not defined but doesn’t necessarily leave a “hole” in the data. NA values usually result when you are creating or loading data that may have parts missing; NULL can be used to remove values. For more information on the difference between these values, see this R-Bloggers post.1

1R: NA vs. NULL post on R-Bloggers: https://www.r-bloggers.com/r-na-vs-null/

8.4.1 Single versus Double Brackets

Remember

Vectors use single-bracket notation for accessing elements by index, but lists use double-bracket notation for accessing elements by index!

The single-bracket syntax used with vectors isn’t actually selecting values by index; instead, it is filtering by whatever vector is inside the brackets (which may be just a single element—the index number to retrieve). In R, single brackets always mean to filter a collection. So if you put single brackets after a list, what you’re actually doing is getting a filtered sublist of the elements that have those indices, just as single brackets on a vector returns a subset of elements from that vector:

# Create the `person` list
person <- list(
  first_name = "Ada",
  job = "Programmer",
  salary = 78000,
  in_union = TRUE
)

# SINGLE brackets return a list
person["first_name"]
            # $first_name
            # [1] "Ada"

# Test if it returns a list
is.list(person["first_name"]) # TRUE

# DOUBLE brackets return a vector
person[["first_name"]] # [1] "Ada"

# Confirm that it *does not* return a list
is.list(person[["first_name"]]) # FALSE

# Use a vector of column names to create a filtered sub-list
person[ c("first_name", "job", "salary")]
             # $first_name
             # [1] "Ada"
             #
             # $job
             # [1] "Programmer"
             #
             # $salary
             # [1] 78000

Notice that with lists you can filter by a vector of tag names (as well as by a vector of element indices).

In short, remember that single brackets return a list, whereas double brackets return a list element. You almost always want to refer to the value itself rather than a list, so you almost always want to use double brackets (or better yet—dollar notation) when accessing lists.

8.5 Applying Functions to Lists with lapply()

Since most functions are vectorized (e.g., paste(), round()), you can pass them a vector as an argument and the function will be applied to each item in the vector. It “just works.” But if you want to apply a function to each item in a list, you need to put in a bit more effort.

In particular, you need to use a function called lapply() (for list apply). This function takes two arguments: a list you want to operate upon, followed by a function you want to “apply” to each item in that list. For example:

# Create an untagged list (not a vector!)
people <- list("Sarah", "Amit", "Zhang")

# Apply the `toupper()` function to each element in `people`
people_upper <- lapply(people, toupper)
            # [[1]]
            # [1] "SARAH"
            #
            # [[2]]
            # [1] "AMIT"
            #
            # [[3]]
            # [1] "ZHANG"

# Apply the `paste()` function to each element in `people`,
# with an addition argument `"dances!"` to each call
dance_party <- lapply(people, paste, "dances!")
            # [[1]]
            # [1] "Sarah dances!"
            #
            # [[2]]
            # [1] "Amit dances!"
            #
            # [[3]]
            # [1] "Zhang dances!"

Caution

Make sure you pass your actual function to the lapply() function, not a character string of your function name (i.e., paste, not "paste"). You’re also not actually calling that function (i.e., paste, not paste()). Just put the name of the function! After that, you can include any additional arguments you want the applied function to be called with—for example, how many digits to round to, or what value to paste to the end of a string.

The lapply() function returns a new list; the original one is unmodified.

You commonly use lapply() with your own custom functions that define what you want to do to a single element in that list:

# A function that prepends "Hello" to any item
greet <- function(item) {
  paste("Hello", item) # this last expression will be returned
}

# Create an untagged list (not a vector!)
people <- list("Sarah", "Amit", "Zhang")

# Greet each person by applying the `greet()` function
# to each element in the `people` list
greetings <- lapply(people, greet)
            # [[1]]
            # [1] "Hello Sarah"
            #
            # [[2]]
            # [1] "Hello Amit"

            # [[3]]
            # [1] "Hello Zhang"

Additionally, lapply() is a member of the “*apply()” family of functions. Each member of this set of functions starts with a different letter and is used with a different data structure, but otherwise all work basically the same way. For example, lapply() is used for lists, while sapply() (simplified apply) works well for vectors. You can use both lapply() and sapply() on vectors, the difference is what the function returns. As you might imagine, lapply() will return a list, while sapply() will return a vector:

# A vector of people
people <- c("Sarah", "Amit", "Zhang")

# Create a vector of uppercase versions of each name, using `sapply`
sapply(people, toupper) # returns the vector "SARAH" "AMIT" "ZHANG"

The sapply() function is really useful only with functions that you define yourself. Most built-in R functions are vectorized so they will work correctly on vectors when used directly (e.g., toupper(people)).

Lists represent an alternative technique to vectors for organizing data in R. In practice, the two data structures will both be used in your programs, and in fact can be combined to create a data frame (described in Chapter 10). For practice working with lists in R, see the set of accompanying book exercises.2

2List exercises: https://github.com/programming-for-data-science/chapter-08-exercises

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
44.223.39.67