gentle tidy eval with examples
Aug 7, 2017
Alex Hayes
7 minute read

I’ve been using the tidy eval framework introduced with dplyr 0.7 for about two months now, and it’s time for an update to my original post on tidy eval. My goal is not to explain tidy eval to you, but rather to show you some simple examples that you can easily generalize from.

library(tidyverse)

starwars
## # A tibble: 87 x 13
##    name  height  mass hair_color skin_color eye_color birth_year gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
##  1 Luke~    172    77 blond      fair       blue            19   male  
##  2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>  
##  3 R2-D2     96    32 <NA>       white, bl~ red             33   <NA>  
##  4 Dart~    202   136 none       white      yellow          41.9 male  
##  5 Leia~    150    49 brown      light      brown           19   female
##  6 Owen~    178   120 brown, gr~ light      blue            52   male  
##  7 Beru~    165    75 brown      light      blue            47   female
##  8 R5-D4     97    32 <NA>       white, red red             NA   <NA>  
##  9 Bigg~    183    84 black      light      brown           24   male  
## 10 Obi-~    182    77 auburn, w~ fair       blue-gray       57   male  
## # ... with 77 more rows, and 5 more variables: homeworld <chr>,
## #   species <chr>, films <list>, vehicles <list>, starships <list>

Using strings to refer to column names

To refer to columns in a data frame with strings, we need to convert those strings into symbol objects with rlang::sym and rlang::syms. We then use the created symbol objects in dplyr functions with the prefixes !! and !!!. This is because dplyr verbs expect input that looks like code. Using the sym/syms functions we can convert strings into objects that look like code.

mass <- rlang::sym("mass")                        # create a single symbol
groups <- rlang::syms(c("homeworld", "species"))  # create a list of symbols

starwars %>%
  group_by(!!!groups) %>%               # use list of symbols with !!!
  summarize(avg_mass = mean(!!mass))    # use single symbol with !!
## # A tibble: 58 x 3
## # Groups:   homeworld [?]
##    homeworld      species   avg_mass
##    <chr>          <chr>        <dbl>
##  1 Alderaan       Human         NA  
##  2 Aleen Minor    Aleena        15  
##  3 Bespin         Human         79  
##  4 Bestine IV     Human        110  
##  5 Cato Neimoidia Neimodian     90  
##  6 Cerea          Cerean        82  
##  7 Champala       Chagrian      NA  
##  8 Chandrila      Human         NA  
##  9 Concord Dawn   Human         79  
## 10 Corellia       Human         78.5
## # ... with 48 more rows

The usage mass <- rlang::sym("mass") is Hadley approved:

I believe it is also the current tidyverse code style standard. We use rlang::sym and rlang::syms identically inside functions.

summarize_by <- function(df, groups, to_summarize) {
  df %>%
    group_by(!!!rlang::syms(groups)) %>%
    summarize(summarized_mean = mean(!!rlang::sym(to_summarize)))
}

summarize_by(starwars, c("homeworld", "species"), "mass")
## # A tibble: 58 x 3
## # Groups:   homeworld [?]
##    homeworld      species   summarized_mean
##    <chr>          <chr>               <dbl>
##  1 Alderaan       Human                NA  
##  2 Aleen Minor    Aleena               15  
##  3 Bespin         Human                79  
##  4 Bestine IV     Human               110  
##  5 Cato Neimoidia Neimodian            90  
##  6 Cerea          Cerean               82  
##  7 Champala       Chagrian             NA  
##  8 Chandrila      Human                NA  
##  9 Concord Dawn   Human                79  
## 10 Corellia       Human                78.5
## # ... with 48 more rows

Details about unquoting

!! and !!! are syntactic sugar on top of the functions UQ() and UQS(), respectively. It used to be that !! and !!! had low operator precedence, meaning that in terms of PEMDAS they came pretty much last. But now we can use them more intuitively:

homeworld <- rlang::sym("homeworld")

filter(starwars, !!homeworld == "Alderaan")
## # A tibble: 3 x 13
##   name  height  mass hair_color skin_color eye_color birth_year gender
##   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
## 1 Leia~    150    49 brown      light      brown             19 female
## 2 Bail~    191    NA black      tan        brown             67 male  
## 3 Raym~    188    79 brown      light      brown             NA male  
## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

We can also use UQ and UQS directly to be explicit about what we’re unquoting.

filter(starwars, UQ(homeworld) == "Alderaan")
## # A tibble: 3 x 13
##   name  height  mass hair_color skin_color eye_color birth_year gender
##   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
## 1 Leia~    150    49 brown      light      brown             19 female
## 2 Bail~    191    NA black      tan        brown             67 male  
## 3 Raym~    188    79 brown      light      brown             NA male  
## # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

Creating non-standard functions

Sometimes it is nice to write functions that use accept non-standard inputs, like dplyr verbs. For example, we might want to write a function with the same effect as

starwars %>% 
  group_by(homeworld, species) %>% 
  summarize(avg_mass = mean(mass))
## # A tibble: 58 x 3
## # Groups:   homeworld [?]
##    homeworld      species   avg_mass
##    <chr>          <chr>        <dbl>
##  1 Alderaan       Human         NA  
##  2 Aleen Minor    Aleena        15  
##  3 Bespin         Human         79  
##  4 Bestine IV     Human        110  
##  5 Cato Neimoidia Neimodian     90  
##  6 Cerea          Cerean        82  
##  7 Champala       Chagrian      NA  
##  8 Chandrila      Human         NA  
##  9 Concord Dawn   Human         79  
## 10 Corellia       Human         78.5
## # ... with 48 more rows

To this we need to capture our input in quosures with quo and quos when programming interactively.

groups <- quos(homeworld, species)   # capture a list of variables as raw input
mass <- quo(mass)                    # capture a single variable as raw input

starwars %>% 
  group_by(!!!groups) %>%            # use !!! to access variables from `quos`
  summarize(avg_mass = sum(!!mass))  # use !! to access the variable in `quo`
## # A tibble: 58 x 3
## # Groups:   homeworld [?]
##    homeworld      species   avg_mass
##    <chr>          <chr>        <dbl>
##  1 Alderaan       Human           NA
##  2 Aleen Minor    Aleena          15
##  3 Bespin         Human           79
##  4 Bestine IV     Human          110
##  5 Cato Neimoidia Neimodian       90
##  6 Cerea          Cerean          82
##  7 Champala       Chagrian        NA
##  8 Chandrila      Human           NA
##  9 Concord Dawn   Human           79
## 10 Corellia       Human          157
## # ... with 48 more rows

There’s some nice symmetry here in that we unwrap both rlang::sym and quo with !! and both rlang::syms and quos with !!!.

We might be interested in using this behavior in a function. To do this we replace calls to quo with calls to enquo.

summarize_by <- function(df, to_summarize, ...) {

  to_summarize <- enquo(to_summarize)  # enquo captures a single argument
  groups <- quos(...)                  # quos captures multiple arguments

  df %>%
    group_by(!!!groups) %>%                 # unwrap quos with !!!
    summarize(summ = sum(!!to_summarize))   # unwrap enquo with !!
}

Now our function call is non-standardized. Note that quos can capture an arbitrary number of arguments, like we have here. So both of the following calls are valid

summarize_by(starwars, mass, homeworld)
## # A tibble: 49 x 2
##    homeworld       summ
##    <chr>          <dbl>
##  1 Alderaan          NA
##  2 Aleen Minor       15
##  3 Bespin            79
##  4 Bestine IV       110
##  5 Cato Neimoidia    90
##  6 Cerea             82
##  7 Champala          NA
##  8 Chandrila         NA
##  9 Concord Dawn      79
## 10 Corellia         157
## # ... with 39 more rows
summarize_by(starwars, mass, homeworld, species)
## # A tibble: 58 x 3
## # Groups:   homeworld [?]
##    homeworld      species    summ
##    <chr>          <chr>     <dbl>
##  1 Alderaan       Human        NA
##  2 Aleen Minor    Aleena       15
##  3 Bespin         Human        79
##  4 Bestine IV     Human       110
##  5 Cato Neimoidia Neimodian    90
##  6 Cerea          Cerean       82
##  7 Champala       Chagrian     NA
##  8 Chandrila      Human        NA
##  9 Concord Dawn   Human        79
## 10 Corellia       Human       157
## # ... with 48 more rows

For more details, see the programming with dplyr vignette.



comments powered by Disqus