© Thomas Mailund 2018

Thomas Mailund, Domain-Specific Languages in R, https://doi.org/10.1007/978-1-4842-3588-1_8

8. Tidy Evaluation

Thomas Mailund

(1)Aarhus N, Staden København, Denmark

The so-called tidyverse refers to a number of R packages designed to work well together and based on similar designs that can all be considered domain-specific languages in themselves. These packages include dplyr, tidyr, and ggplot2 and mainly consist of functions that do non-standard evaluation. The way they manage non-standard evaluation is consistent among the packages and based on what they call tidy evaluation, which primarily relies on two features implemented in the rlang package: quosures and quasi-quotation.

We will use the rlang package once more, but to make it explicit when we use this package and not basic R non-standard evaluation, I will usually use fully qualified names. In other words, I will write rlang::quo instead of quo and not load the package. We will also use the purrr package, but here as well I will use fully qualified names. I will load magrittr to get the pipeline operator, however, and the tibble and dplyr packages for working with data frames.

library(magrittr)
library(dplyr)
library(tibble)

Quosures

The rlang package provides functions to replace quote and substitute that create quosures—expressions based on formulas that carry with them their environment—instead of quoted expressions. To create a quosure from an expression, you use quo.

q <- rlang::quo(2 * x)
q


## <quosure>
##   expr: ^2 * x
##   env:  global

Inside a function call, the quosure analog to substitute is enquo.

f <- function(expr) rlang::enquo(expr)
q <- f(2 * x)
q


## <quosure>
##   expr: ^2 * x
##   env:  global

In both examples, the scope associated with the quosure is the global environment because this is the level at which the expression is written.

A quosure is just a special type of formula, so we can access one as we did in the previous section to get the environment and expression,

q[[2]]

## 2 * x

environment(q)

## <environment: R_GlobalEnv>

However, rlang provides functions for working with quosures that make the intent of our code clearer. To get the expression out of a quosure, we use the function get_expr.

rlang::get_expr(q)

## 2 * x

The term is ironic since the unquote function returns a quoted expression, but it does strip away the quosure-ness and gives us a raw expression. There is another unquote function, UQ, that does unquote an expression in the sense of evaluating it, but it has a different purpose that we get to in the next section.

Getting the environment associated with a quosure can be done using environment as we saw earlier, but the rlang function for this is get_env.

rlang::get_env(q)

## <environment: R_GlobalEnv>

With quosures, you can no longer evaluate them with eval. A quosure is a formula, and the result of evaluating a formula is the formula itself.

eval(q)

## <quosure>
##   expr: ^2 * x
##   env:  global

Instead, you need to use the function eval_tidy.

x <- 1
rlang::eval_tidy(q)


## [1] 2

x <- 2
rlang::eval_tidy(q)


## [1] 4

The quosure is evaluated inside the environment it is associated with. We created quosure q inside function f, but its environment is the global environment, so when we modify this, by changing x, it affects the result of evaluating q.

If we create a quosure inside a local function scope, it will remember this context—just like a closure. For example, if we define function f as this

f <- function(x, y) rlang::quo(x + y + z)

the quosure will know the function parameters x and y from when f is called but will have to find z elsewhere. Consider the contrast between evaluating the quosure and the expression x + y + z directly, shown here:

q <- f(1, 2)
x <- y <- z <- 3
rlang::eval_tidy(q) # 1 + 2 + 3


## [1] 6

x + y + z # 3 + 3 + 3

## [1] 9

Just like eval, eval_tidy lets you provide a list, data frame, or environment with bindings from variables to values. When you do this, the values you provide will overrule the variables in the quosure’s environment—an effect known as over-scoping. Consider this:

x <- 1:4
y <- 1:4
q <- quo(x+y)
rlang::eval_tidy(q)


## [1] 2 4 6 8

rlang::eval_tidy(q, list(x = 5:8))

## [1]  6  8 10 12

The quosure q is bound to the global environment, so when we evaluate it, x and y are both 1:4. However, when we provide the second argument to eval_tidy, we can override the value of x to 5:8. You will recognize this feature from dplyr where you have access to columns in data frames in arguments you provide to the functions there, and these columns overrule any global variable that might otherwise have been used.

This can also be used to override variables in a function call with function parameters. Consider these two functions:

f <- function(expr,x) {
  q <- rlang::enquo(expr)
  rlang::eval_tidy(q)
}
g <- function(expr,x) {
  q <- rlang::enquo(expr)
  rlang::eval_tidy(q, environment())
}
f(x + y, x = 5:8)


## [1] 2 4 6 8

g(x + y, x = 5:8)

## [1]  6  8 10 12

The function f evaluates the quosure in its scope, which doesn’t contain the function parameter x, while the function g over-scopes with the function environment. This makes the variable x refer to the function parameter rather than the global parameter.

The expression you evaluate with eval_ tidy doesn’t have to be a quosure. The function is equally happy to evaluate bare expressions, and then it behaves just like eval.

rlang::eval_tidy(quote(x + y))

## [1] 2 4 6 8

Just like eval, eval_ tidy takes a third argument that will behave as the enclosing scope. This is used for bare expressions—those created with quote.

rlang::eval_tidy(quote(xx), env = list2env(list(xx = 5:8)))

## [1] 5 6 7 8

It is not used with quosures.

rlang::eval_tidy(quo(xx), env = list2env(list(xx = 5:8)))

## Error in rlang::eval_tidy(quo(xx), env = list2env(list(xx = 5:8))): object 'xx' not found

The list2env function I have used here translates a list into an environment—as strongly hinted by the name. It is a quick way to construct an environment and populate it with variables.

If you want to create a closure with over-scoping (in other words, you want to create a function that evaluates a quosure where it first finds local variables and then look in the quosure’s environment), you cannot directly call eval_tidy when creating the function. This would ask R to attempt to evaluate the closure, but you do not yet have the variables you need—those are provided when you call the closure. Instead, you can separate the bare expression and the environment of the quosure using get_expr and get_env, respectively. Consider the function make_function shown here:

make_function <- function(args, body) {
  body <- rlang::enquo(body)
  rlang::new_function(args, rlang::get_expr(body), rlang::get_env(body))
}
f <- function(z) make_function(alist(x=, y=), x + y + z)
g <- f(z = 1:4)
g


## function (x, y)
## x + y + z
## <environment: 0x7f9b0154b808>


g(x = 1:4, y = 1:4)

## [1]  3  6  9 12

Here, make_function takes two arguments, a pair list of arguments and an expression for the body of a function. It is slightly more primitive than the lambda expressions we wrote in the previous chapter, but it is essentially doing the same thing. In this example, I am focusing on the closure we create rather than on language design issues. We translate the function body into a quosure, which guarantees that we have an environment associated with it. In the function we create using new_function, however, we strip the environment from the quosure to create the body of the new function, but we assign the function’s environment to be the quosure environment.

In the function f we have a local scope that knows the value of z, and we create a new function in this scope. The quosure we get from this is associated with the local f scope, so it also knows about z. We provide variables x and y, when calling g, but the z value is taken from the local scope of f.

If called directly, there is no difference between using the caller’s environment or the quosure’s environment.

make_function_quo <- function(args, body) {
  body <- rlang::enquo(body)
  rlang::new_function(args, rlang::get_expr(body), rlang::get_env(body))
}
make_function_quote <- function(args, body) {
  body <- substitute(body)
  rlang::new_function(args, body, rlang::caller_env())
}
g <- make_function_quo(alist(x=, y=), x + y)
h <- make_function_quote(alist(x=, y=), x + y)
g(x = 1:4, y = 1:4)


## [1] 2 4 6 8

h(x = 1:4, y = 1:4)

## [1] 2 4 6 8

However, consider a more involved example, where we collect expressions in a list and have a function for translating all the expressions into functions that we can then apply over values using the invoke_map function from the purrr package. We can construct the expressions like this, using the linked lists structure we have previously used:

cons <- function(elm, lst) list(car=elm, cdr=lst)
lst_length <- function(lst) {
  len <- 0
  while (!is.null(lst)) {
    lst <- lst$cdr
    len <- len + 1
  }
  len
}
lst_to_list <- function(lst) {
  v <- vector(mode = "list", length = lst_length(lst))
  index <- 1
  while (!is.null(lst)) {
    v[[index]] <- lst$car
    lst <- lst$cdr
    index <- index + 1
  }
  v
}
expressions <- function() list(ex = NULL)
add_expression <- function(ex, expr) {
  ex$ex <- cons(rlang::enquo(expr), ex$ex)
  ex
}

Translating the expressions into functions is straightforward. We need to reverse the resulting list only if we want the functions in the order we add them since we prepend expressions when we use the linked lists.

make_functions <- function(ex, args) {
  results <- vector("list", length = lst_length(ex$ex))
  i <- 1; lst <- ex$ex
  while (!is.null(lst)) {
    results[[i]] <-
      rlang::new_function(args, rlang::get_expr(lst$car),
                           rlang::get_env(lst$car))
    i <- i + 1
    lst <- lst$cdr
  }
  rev(results)
}

With this small domain-specific language for collecting expressions, we can write a function that creates expressions for computing y-coordinates of a line given an intercept.

make_line_expressions <- function(intercept) {
  expressions() %>%
    add_expression(coef + intercept) %>%
    add_expression(2*coef + intercept) %>%
    add_expression(3*coef + intercept) %>%
    add_expression(4*coef + intercept)
}

The expressions know the intercept when we call make_line_expressions—that is the intent at least—but the coefficient should be added later in a function call. We can create the functions for the expressions using another function.

eval_line <- function(ex, coef) {
  ex %>% make_functions(alist(coef=)) %>%
    purrr::invoke_map(coef = coef) %>% unlist()
}

The invoke_map function is similar to the various map functions from purrr, but instead of mapping a function over several values, it takes a sequence of functions and applies each to a value.

We can now pipe these functions together to get points on a line.

make_line_expressions(intercept = 0) %>% eval_line(coef = 1)

## [1] 1 2 3 4

make_line_expressions(intercept = 0) %>% eval_line(coef = 2)

## [1] 2 4 6 8

make_line_expressions(intercept = 1) %>% eval_line(coef = 1)

## [1] 2 3 4 5

Everything works as intended here, but what would happen if we used quotes instead? It is simple to write the corresponding functions.

add_expression <- function(ex, expr) {
  ex$ex <- cons(substitute(expr), ex$ex)
  ex
}
make_functions <- function(ex, args) {
  results <- vector("list", length = lst_length(ex$ex))
  i <- 1; lst <- ex$ex
  while (!is.null(lst)) {
    results[[i]] <- rlang::new_function(args, lst$car, rlang::caller_env())
    i <- i + 1
    lst <- lst$cdr
  }
  rev(results)
}

We will get an error if we try to use them as before, however.

make_line_expressions(intercept = 0) %>% eval_line(coef = 1)

## Error in (function (coef) : object 'intercept' not found

The reason for this is obvious once we consider which environments contain information about the intercept. This variable lives in the scope of calls to make_line_expressions, but when we create the functions, we do so by calling make_functions from inside eval_line. The functions are created with eval_line local environments as their closures, and intercept is not found there.

In general, it is safer to use quosures than bare expressions for non-standard evaluation exactly because they carry their environment with them, alleviating the problems we have with keeping track of which environment to evaluate expressions in.

One thing to mention before we move on to the next topic is the function quos. This function works as quo but for a sequence of arguments that are returned as a list of quosures.

rlang::quos(x, y, x+y)

## [[1]]
## <quosure>
##   expr: ^x
##   env:  global
##
## [[2]]
## <quosure>
##   expr: ^y
##   env:  global
##
## [[3]]
## <quosure>
##   expr: ^x + y
##   env:  global

The primary use of quos is to translate the three-dots argument into a list of quosures.

f <- function(...) rlang::quos(...)
f(x, y, z)


## [[1]]
## <quosure>
##   expr: ^x
##   env:  global
##
## [[2]]
## <quosure>
##   expr: ^y
##   env:  global
##
## [[3]]
## <quosure>
##   expr: ^z
##   env:  global

Quasi-quoting

The final topic of this chapter involves quasi-quoting. This is a mechanism by which we can work with quoted expressions but at the same time substitute some parts of the expression by the value that subexpressions evaluate to. When we directly call functions that do non-standard evaluation, we can usually provide expressions exactly as we want them, but as soon as we start using such non-standard evaluation functions in programs where we call them from other functions, we need some flexibility in how we construct expressions. We can do this with meta-programming where we modify call objects, but a better approach is implemented in the rlang package, the so-called quasi-quoting.

Consider this simple example. We have a data frame, and we want to filter away rows where a given column has missing data. We can do this using dplyr’s filter function like this:

df <- tribble(
  ~x, ~y,
   1,  1,
  NA,  2,
   3,  3,
   4, NA,
   5,  5,
  NA,  6,
   7, NA
)
df %>% dplyr::filter(!is.na(x))


## # A tibble: 5 x 2
##       x     y
##   <dbl> <dbl>
## 1    1.    1.
## 2    3.    3.
## 3    4.   NA
## 4    5.    5.
## 5    7.   NA


df %>% dplyr::filter(!is.na(y))

## # A tibble: 5 x 2
##       x     y
##   <dbl> <dbl>
## 1    1.    1.
## 2   NA     2.
## 3    3.    3.
## 4    5.    5.
## 5   NA     6.

Here, we use the same code for two different variables. It is straightforward code, so we would not write a function to avoid the duplication, but for more complicated pipelines, we would. Therefore, for the sake of argument, let’s imagine that we want to replace the pipeline with a function. We could attempt to write it like this:

filter_on_na <- function(df, column) {
  column <- substitute(column)
  df %>% dplyr::filter(!is.na(column))
}
df %>% filter_on_na(x)


## Warning in is.na(column): is.na() applied to non-
## (list or vector) of type 'symbol'


## # A tibble: 7 x 2
##       x     y
##   <dbl> <dbl>
## 1    1.    1.
## 2   NA     2.
## 3    3.    3.
## 4    4.   NA
## 5    5.    5.
## 6   NA     6.
## 7    7.   NA

We use substitute to translate the column name into a symbol and then apply the filter pipeline. It doesn’t work, of course, and the reason is that filter does non-standard evaluation as well and translates the predicate !is.na(column) into this exact expression. So, it needs to know the variable column. Now, since filter evaluates its argument as a quosure, it can find the variable column, but it finds that this is a symbol—it is, in this case, the symbol x—but that is not what we want it to see. We want filter to see the column x, but that is not how filter works.

What we want to do is to substitute the symbol held in the variable column into the expression/quosure that filter sees. We can do this with the “bang-bang” operator !!.

filter_on_na <- function(df, column) {
  column <- rlang::enexpr(column)
  df %>% dplyr::filter(!is.na(!!column))
}
df %>% filter_on_na(x)


## # A tibble: 5 x 2
##       x     y
##   <dbl> <dbl>
## 1    1.    1.
## 2    3.    3.
## 3    4.   NA
## 4    5.    5.
## 5    7.   NA


df %>% filter_on_na(y)

## # A tibble: 5 x 2
##       x     y
##   <dbl> <dbl>
## 1    1.    1.
## 2   NA     2.
## 3    3.    3.
## 4    5.    5.
## 5   NA     6.

This operator unquotes the following expression, evaluates it, and puts the result into the quoted expression that filter sees. So the value of column, rather than the symbol column, gets inserted into!is.na(!!column).

You will have noticed that I used a different function, rlang::enexpr, to create the quoted column name. In the rlang package there are two functions that behave like quote and substitute in that they create bare expressions, but they allow quasi-quoting with the bang-bang operator, something substitute and quote do not. Consider the functions f and g defined like this:

f <- function(x) substitute(x)
g <- function(x) rlang::enexpr(x)

When called directly, both will return the expression we give as the parameter x, but consider the case where we call them from another function, h, where we want to substitute its parameter for the parameter x.

h <- function(func, var) func(!!var)
h(f, quote(x))


## !(!var)

h(g, quote(x))

## x

In f, where we use substitute, we get the expression !!var back, but in function g, where we use enexpr, we get the substitution done and get the desired result x.

The function enexpr works like enquo to translate function parameters into expressions, but it translates them into bare expressions rather than quosures. As the analog to quo, which takes an expression directly and creates a quosure, we have expr, which creates a bare expression—like quote—but allows quasi-quotation.

x <- y <- 1
quote(2 * x + !!y)


## 2 * x + (!(!y))

rlang::expr(2 * x + !!y)

## 2 * x + 1

rlang::quo(2 * x + !!y)

## <quosure>
##   expr: ^2 * x + 1
##   env:  global

You have to be a little bit careful when using the bang-bang operator, depending on which version of the rlang package you use. It is built from the negation operator, !, which has a low precedence. The only operators with lower precedence are the logical operators and assignments. This means if you try to unquote x in the expression x + y, you might, in fact, be unquoting the entire expression if you simply write !!x + y. You only get !! bound to x if the operator you use in the expression is a logical operator.

If you try to evaluate this express, you will see whether you have an old version or a new version of the package. If you have an old version, the first expression will evaluate x + y since ! has lower precedence than +; if you have a newer version of rlang, then only the value of x will be substitute into the expression.

x <- y <- 2
rlang::expr(!!x + y)


## 2 + y

The most recent version of the package, at the time of writing, has resolved this; instead of relying on R’s precedence rules, it will examine the expression before it evaluates unquoted expressions, and it gives the bang-bang operator a much tighter precedence than the negation operator has. If you do not have the latest version of the package, however, you can get around this problem using parentheses, or you can use the function-variant of unquoting, UQ.

rlang::expr(UQ(x) + y)

## 2 + y

When we translate a function argument into an expression, with enexpr, or a quosure, with enquo, the bang-bang operator or the UQ function will substitute the results of expressions into the expressions/quosures we create, which is why the second filter_on_na function worked.

You can unquote on the left-hand side of named parameters in function calls as well, but the R parser does not allow you to write code such as this:

f(UQ(var) = expr)

This is because the left-hand side in function call arguments must be symbols. To get around this problem, the rlang package provides an implementation of the := operator that does allow such assignments. Consider the following:

f <- function(df, summary_name, summary_expr) {
  summary_name <- rlang::enexpr(summary_name)
  summary_expr <- rlang::enquo(summary_expr)
  df %>% mutate(UQ(summary_name) := UQ(summary_expr))
}
tibble(x = 1:4, y = 1:4) %>% f(z, x + y)


## # A tibble: 4 x 3
##       x     y     z
##   <int> <int> <int>
## 1     1     1     2
## 2     2     2     4
## 3     3     3     6
## 4     4     4     8

The named function call in the following:

  df %>% mutate(UQ(summary_name) := UQ(summary_expr))

behaves like a usual function call with named parameters after the quasi-quotation substitution has taken place.

A final quasi-quotation function you must know is UQS. This function behaves as UQ in that it evaluates its arguments and puts the result into a quoted expression, but it is intended for splicing a list of quosures into a function call.

Consider this example:

args <- rlang::quos(x = 1, y = 2)
q <- rlang::expr(f(rlang::UQS(args)))
q


## f(x = ~1, y = ~2)

Here, we create a list of quosures for arguments x and y and create an expression that calls the function f with these arguments. The UQS function is what splices the arguments into the expression for the function call. Using UQ, we would get a different expression.

q <- rlang::expr(f(rlang::UQ(args)))
q


## f(list(x = ~1, y = ~2))

As for UQ, though, there is an operator version of UQS, the triple-bang operator: !!!.

q <- rlang::expr(f(!!!args))
q


## f(x = ~1, y = ~2)

To see UQS in action, we consider another toy example. We write a function for evaluating the mean of an expression given a data frame. Imagine that we want to map over several data frames and compute the mean of the same expression or something like that. We construct a function for this that evaluates an expression we capture as a quosure and computes the mean of the expression in a data frame, using additional arguments to modify the call to mean.

mean_expr <- function(ex, ...) {
  ex <- rlang::enquo(ex)
  extra_args <- rlang::dots_list(...)
  mean_call <- rlang::expr(with(
      data,
      mean(!!rlang::get_expr(ex), !!!extra_args))
  )
  rlang::new_function(args = alist(data=),
                      body = mean_call,
                      env = rlang::get_env(ex))
}
mean_sum <- mean_expr(x + y, na.rm = TRUE)
mean_sum


## function (data)
## with(data, mean(x + y, na.rm = TRUE))

There are a few new things to observe here. We use dots_list to evaluate the arguments in the ... parameter. We could deal with this in other ways, but we don’t necessarily want them to be evaluated lazily, and we just want a list to use as extra arguments to a call to mean. We create an expression from the ex parameter. Here, we wrap the bare expression in ex inside an expression that involves the with function to include data in the form of a data frame. We use the !!! operator to splice the additional parameters into the call to mean. We then construct a new function that takes a single argument, data; evaluates the with(mean(...)) expression in its body; and is evaluated in the scope where ex was defined.

We can test it with the data frame we created earlier:

df

## # A tibble: 7 x 2
##       x     y
##   <dbl> <dbl>
## 1    1.    1.
## 2   NA     2.
## 3    3.    3.
## 4    4.   NA
## 5    5.    5.
## 6   NA     6.
## 7    7.   NA


mean_sum(df)

## [1] 6

To see that the environment we evaluated the expression in is where it was defined, we can write a function to capture a parameter in a local scope and see that this scope is available when we call the mean function.

f <- function(z) mean_expr(x + y + z, na.rm = TRUE, trim = 0.1)
g <- f(z = 1:7)
g


## function (data)
## with(data, mean(x + y + z, na.rm = TRUE, trim = 0.1))
## <environment: 0x7f9b02d4ab88>


g(df)

## [1] 9

Using quosures and quasi-quotes adds to the problems you can have with keeping track of environments for non-standard evaluation. I strongly suggest you use these instead of the bare R quote and substitute approach. They do not, however, fix the problem with deciding when to quote arguments and with calling functions that translate arguments into expressions. If you call a function that uses enquo from a function where you have already quoted the argument, then you get it double-quoted.

f <- function(expr) rlang::enquo(expr)
g <- function(expr) f(rlang::enquo(expr))
f(x + y)


## <quosure>
##   expr: ^x + y
##   env:  global


g(x + y)

## <quosure>
##   expr: ^rlang::enquo(expr)
##   env:  0x7f9b016be9c8


rlang::eval_tidy(f(x + y), list(x = 1, y = 2))

## [1] 3

rlang::eval_tidy(g(x + y), list(x = 1, y = 2))

## <quosure>
##   expr: ^x + y
##   env:  global

Here, the solution is as before: if you need to call functions with already quoted expressions, make two versions—one that expects its argument to be quoted and one that does it for you.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.104.109