class: center, middle ## `gt` Tables <img src="img/hero_wall_pink.png" width="800px"/> ### Kelly McConville .large[Math 241 | Week 7 | Spring 2021] --- ## Announcements/Reminders * Lab 5 is posted. --- ## Goals for Today * Finish up code smells and refactoring. * A quick note on tidy evaluation. * Talk about display tables. --- ## Non-Standard Evaluation * On your lab this week, you are going to create functions for common `ggplot2` and `dplyr` operations. * But that means we need to learn about one more idea ```r library(pdxTrees) pdxTrees_parks <- get_pdxTrees_parks() # Minimal viable product working code ggplot(data = pdxTrees_parks, mapping = aes(x = DBH)) + geom_histogram() ``` <img src="slidesWk7Th_files/figure-html/unnamed-chunk-1-1.png" width="360" /> --- ## Non-Standard Evaluation * On your lab this week, you are going to create functions for common `ggplot2` and `dplyr` operations. * But that means we need to learn about one more idea ```r # Shorthand histogram function histo <- function(data, x, ...){ ggplot(data = data, mapping = aes(x = x)) + geom_histogram() } # Test it histo(pdxTrees_parks, DBH) ``` ``` ## Error in FUN(X[[i]], ...): object 'DBH' not found ``` <img src="slidesWk7Th_files/figure-html/unnamed-chunk-2-1.png" width="360" /> --- ## Non-Standard Evaluation * On your lab this week, you are going to create functions for common `ggplot2` and `dplyr` operations. * But that means we need to learn about one more idea ```r # Shorthand histogram function histo <- function(data, x, ...){ ggplot(data = data, mapping = aes(x = x)) + geom_histogram() } # Test it histo(pdxTrees_parks, "DBH") ``` ``` ## Error: StatBin requires a continuous x variable: the x variable is discrete.Perhaps you want stat="count"? ``` <img src="slidesWk7Th_files/figure-html/unnamed-chunk-3-1.png" width="360" /> --- ## Non-Standard Evaluation **Solution 1**: Supply the column name as a character vector and then use `.data[[ x ]]` to locate it. ```r histo <- function(data, x, ...){ ggplot(data = data, mapping = aes(x = .data[[x]])) + geom_histogram(...) } # Test it histo(pdxTrees_parks, "DBH") ``` <img src="slidesWk7Th_files/figure-html/unnamed-chunk-4-1.png" width="360" /> --- **Solution 1**: Supply the column name as a character vector and then use `.data[[ x ]]` to locate it. ```r histo <- function(data, x, ...){ stopifnot(is.numeric(data[[x]])) ggplot(data = data, mapping = aes(x = .data[[x]])) + geom_histogram(...) } # Test it histo(pdxTrees_parks, "Condition") ``` ``` ## Error in histo(pdxTrees_parks, "Condition"): is.numeric(data[[x]]) is not TRUE ``` --- ## Non-Standard Evaluation **Solution 2**: If you pass the column name directly, use `{{ x }}` to pass it to a tidy evaluation enabled function. ```r histo <- function(data, x, ...){ ggplot(data = data, mapping = aes(x = {{ x }})) + geom_histogram() } # Test it histo(pdxTrees_parks, DBH) ``` <img src="slidesWk7Th_files/figure-html/unnamed-chunk-6-1.png" width="360" /> --- ## Non-Standard Evaluation **Solution 2**: If you pass the column name directly, use `{{ x }}` to pass it to a tidy evaluation enabled function. ```r histo <- function(data, x, ...){ stopifnot(is.numeric(rlang::eval_tidy(enquo(x), data))) ggplot(data = data, mapping = aes(x = {{ x }})) + geom_histogram() } # Test it histo(pdxTrees_parks, Condition) ``` ``` ## Error in histo(pdxTrees_parks, Condition): is.numeric(rlang::eval_tidy(enquo(x), data)) is not TRUE ``` --- ## [Display Format](https://www.bts.gov/content/age-and-availability-amtrak-locomotive-and-car-fleets) Unfortunately, many datasets on the internet are often in **Display Format**, not **Analysis Format**. <img src="img/amtrak.png" width="80%" /> --- ## [Display Format](https://www.bts.gov/content/age-and-availability-amtrak-locomotive-and-car-fleets) This means we spend a LOT of energy wrangling the data so that we can use it in our analyses. ```r library(tidyverse) library(readxl) Amtrak <- read_excel("table_01_33_102020.xlsx", range = c("A2:AI8"), na = c("U", NA)) %>% rename(Vars = ...1) %>% slice(-1, -4) %>% mutate(TrainType = c(rep("Locomotives", 2), rep("Passenger and Other", 2))) %>% relocate(TrainType) %>% mutate(Vars = case_when( Vars == "Percent available for servicea" ~ "PercentAvailable", Vars == "Average age (years)b" ~ "AverageAge" )) %>% pivot_longer(cols = !c(TrainType, Vars), names_to = "Year", values_to = "Value") %>% pivot_wider(names_from = Vars, values_from = Value) %>% mutate(Year = as.numeric(Year)) ``` --- ## Display Format This means we spend a LOT of energy wrangling the data so that we can use it in our analyses. ```r Amtrak ``` ``` ## # A tibble: 68 x 4 ## TrainType Year PercentAvailable AverageAge ## <chr> <dbl> <dbl> <dbl> ## 1 Locomotives 1972 NA 22.3 ## 2 Locomotives 1975 87 14.4 ## 3 Locomotives 1980 83 7.4 ## 4 Locomotives 1985 93 7 ## 5 Locomotives 1990 84 12 ## 6 Locomotives 1991 86 13 ## 7 Locomotives 1992 83 13 ## 8 Locomotives 1993 84 13.2 ## 9 Locomotives 1994 85 13.4 ## 10 Locomotives 1995 88 13.9 ## # … with 58 more rows ``` --- ## Display Format This means we spend a LOT of energy wrangling the data so that we can use it in our analyses. ```r ggplot(data = Amtrak, mapping = aes(x = Year, y = AverageAge, color = TrainType)) + geom_line() ``` <img src="slidesWk7Th_files/figure-html/unnamed-chunk-11-1.png" width="360" /> --- ## Display Format But once you have finished analyzing the data, you may want to now convert a table back to **display format**. <img src="img/DAW.png" width="60%" style="display: block; margin: auto;" /> --- ## Display Format And, just printing the table doesn't really cut it! ```r Amtrak %>% filter(Year <= 2015, Year > 2010) ``` ``` ## # A tibble: 10 x 4 ## TrainType Year PercentAvailable AverageAge ## <chr> <dbl> <dbl> <dbl> ## 1 Locomotives 2011 84.2 20 ## 2 Locomotives 2012 83.7 21 ## 3 Locomotives 2013 80.3 21.9 ## 4 Locomotives 2014 82.5 21.5 ## 5 Locomotives 2015 82 21.1 ## 6 Passenger and Other 2011 87.7 26.5 ## 7 Passenger and Other 2012 88.7 27.7 ## 8 Passenger and Other 2013 89.1 28.6 ## 9 Passenger and Other 2014 89.1 29.6 ## 10 Passenger and Other 2015 88.8 30.7 ``` --- ## But Aren't Graphs > Tables for Displaying Data? Graphs are useful when: * You want to display trends, relationships, and/or overall shape. * You want to pull in the potential reader/viewer. -- Tables are useful when: * You want to provide structured, summarized and aggregated information. * You want to show the exact values. * The viewer wants to be able to look up and compare individual values. -- Like with graphs, you should * Format your tables to suit their function. * Make important comparisons easy. --- ## `gt` (Grammar of Tables) Package .left-column[ <img src="img/gt.png" width="80%" /> ] -- .right-column[ <img src="img/gt_table_parts.png" width="90%" /> ] --- class: inverse, middle, center ### Handout: gt.Rmd