class: center, middle # Data Viz: More `ggplot2` <img src="img/DAW.png" width="500px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 2 | Fall 2020] </span> --- ## Announcements * Don't forget to come by office hours twice during the first four weeks of the semester! --- ## Week 2 Topics * **Creating** Data Visualizations --- # Goals for Today * Discuss the standard graphic for categorical data + **Barplot**: one categorical variable + **Segmented barplot**: two categorical variables * Learn how to build these graphs with `ggplot2`. * Incorporate more variables into our plots! * Add context to our plots. --- ## Load the [Portland Biketown data](https://www.biketownpdx.com/system-data) ```r # Load necessary package library(tidyverse) # Import the data biketown <- read_csv("/home/courses/math141f20/Data/biketown_spring1920.csv") # Remove suspect points biketown <- filter(biketown, Distance_Miles < 1000) ``` --- # Barplots ```r # Display variables select(biketown, StartDate, PaymentPlan, Month, Year) ``` ``` ## # A tibble: 124,721 x 4 ## StartDate PaymentPlan Month Year ## <chr> <chr> <ord> <dbl> ## 1 3/1/2019 Casual Mar 2019 ## 2 3/1/2019 Subscriber Mar 2019 ## 3 3/1/2019 Subscriber Mar 2019 ## 4 3/1/2019 Subscriber Mar 2019 ## 5 3/1/2019 Subscriber Mar 2019 ## 6 3/1/2019 Subscriber Mar 2019 ## 7 3/1/2019 Subscriber Mar 2019 ## 8 3/1/2019 Subscriber Mar 2019 ## 9 3/1/2019 Subscriber Mar 2019 ## 10 3/1/2019 Casual Mar 2019 ## # … with 124,711 more rows ``` --- # Barplots <img src="wk02_fri_files/figure-html/unnamed-chunk-4-1.png" width="360" /> * Displays the frequency for each category. --- # Barplots ```r # Create barplot ggplot(data = biketown, mapping = aes(x = Month)) + geom_bar() ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-5-1.png" width="360" /> --- # Segmented Barplots <img src="wk02_fri_files/figure-html/unnamed-chunk-6-1.png" width="360" /> * Each bar is divided into the frequencies of the `fill` variable. * Hard to make comparisons across categories. --- # Segmented Barplots ```r ggplot(data = biketown, mapping = aes(x = Month, fill = PaymentPlan)) + geom_bar() ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-7-1.png" width="360" /> --- # Segmented Barplots <img src="wk02_fri_files/figure-html/unnamed-chunk-8-1.png" width="360" /> --- # Segmented Barplots <img src="wk02_fri_files/figure-html/unnamed-chunk-9-1.png" width="360" /> * Each bar is divided into **proportions** based on the `fill` variable. --- # Segmented Barplots ```r ggplot(data = biketown, mapping = aes(x = Month, fill = PaymentPlan)) + geom_bar(position = "fill") ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-10-1.png" width="360" /> --- # Adding More Variables * Two main approaches: + Utilize other `aes`thetics of the `geom` + Facet: Create multiple plots across the categories of a categorical variable. --- # Utilize other `aes`thetics ```r #Summarize use by time and month biketown2 <- count(biketown, StartTime, PaymentPlan) biketown2 ``` ``` ## # A tibble: 2,860 x 3 ## StartTime PaymentPlan n ## <time> <chr> <int> ## 1 00'00" Casual 16 ## 2 00'00" Subscriber 12 ## 3 01'00" Casual 17 ## 4 01'00" Subscriber 21 ## 5 02'00" Casual 22 ## 6 02'00" Subscriber 18 ## 7 03'00" Casual 15 ## 8 03'00" Subscriber 11 ## 9 04'00" Casual 11 ## 10 04'00" Subscriber 20 ## # … with 2,850 more rows ``` --- # Utilize other `aes`thetics ```r ggplot(data = biketown2, mapping = aes(x = StartTime, y = n, color = PaymentPlan)) + geom_point(alpha = 0.2) ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-12-1.png" width="360" /> --- # Facet <img src="wk02_fri_files/figure-html/unnamed-chunk-13-1.png" width="360" /> --- # Facet ```r ggplot(data = biketown2, mapping = aes(x = StartTime, y = n)) + geom_point(alpha = 0.2) + facet_wrap(~PaymentPlan, ncol = 2) ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-14-1.png" width="360" /> * Add the `facet_wrap()` layer. --- # Facet ```r biketown3 <- count(biketown, StartTime, PaymentPlan, Year) ggplot(data = biketown3, mapping = aes(x = StartTime, y = n)) + geom_point(alpha = 0.2) + facet_wrap(Year~PaymentPlan, ncol = 2) ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-15-1.png" width="360" /> * Add the `facet_wrap()` layer. --- ### Context ```r ggplot(data = biketown2, mapping = aes(x = StartTime, y = n, color = PaymentPlan)) + geom_point(alpha = 0.2) + labs(x = "Checkout Time", y = "Number of Checkouts", color = "Type of User") ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-16-1.png" width="360" /> --- ### Context ```r ggplot(data = biketown2, mapping = aes(x = StartTime, y = n, color = PaymentPlan)) + geom_point(alpha = 0.2) + labs(x = "Checkout Time", y = "Number of Checkouts", color = "Type of User", title = "Checkout Frequencies over the Course \nof a Day", caption = "Data from www.biketownpdx.com/system-data") ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-17-1.png" width="360" /> --- # Customizing your `ggplot2` Plots * There are so many ways you can customize the look of your `ggplot2` plots. * Let's look at some common changes: + Reorder a variable + Fussing with labels + Color! + Themes --- # Re-order the Bars ```r # Change the order biketown <- mutate(biketown, Month = fct_relevel(Month, "May", "Apr", "Mar")) ggplot(data = biketown, mapping = aes(x = Month)) + geom_bar() ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-18-1.png" width="360" /> --- # Fix the Labels ```r ggplot(data = biketown, mapping = aes(x = Month, fill = PaymentPlan)) + geom_bar(position = "fill") + scale_x_discrete(name = "Month of Check-out", labels = c("May", "April", "March")) ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-19-1.png" width="360" /> --- # Change the Color ```r colors() ``` ``` ## [1] "white" "aliceblue" "antiquewhite" ## [4] "antiquewhite1" "antiquewhite2" "antiquewhite3" ## [7] "antiquewhite4" "aquamarine" "aquamarine1" ## [10] "aquamarine2" "aquamarine3" "aquamarine4" ## [13] "azure" "azure1" "azure2" ## [16] "azure3" "azure4" "beige" ## [19] "bisque" "bisque1" "bisque2" ## [22] "bisque3" "bisque4" "black" ## [25] "blanchedalmond" "blue" "blue1" ## [28] "blue2" "blue3" "blue4" ## [31] "blueviolet" "brown" "brown1" ## [34] "brown2" "brown3" "brown4" ## [37] "burlywood" "burlywood1" "burlywood2" ## [40] "burlywood3" "burlywood4" "cadetblue" ## [43] "cadetblue1" "cadetblue2" "cadetblue3" ## [46] "cadetblue4" "chartreuse" "chartreuse1" ## [49] "chartreuse2" "chartreuse3" "chartreuse4" ## [52] "chocolate" "chocolate1" "chocolate2" ## [55] "chocolate3" "chocolate4" "coral" ## [58] "coral1" "coral2" "coral3" ## [61] "coral4" "cornflowerblue" "cornsilk" ## [64] "cornsilk1" "cornsilk2" "cornsilk3" ## [67] "cornsilk4" "cyan" "cyan1" ## [70] "cyan2" "cyan3" "cyan4" ## [73] "darkblue" "darkcyan" "darkgoldenrod" ## [76] "darkgoldenrod1" "darkgoldenrod2" "darkgoldenrod3" ## [79] "darkgoldenrod4" "darkgray" "darkgreen" ## [82] "darkgrey" "darkkhaki" "darkmagenta" ## [85] "darkolivegreen" "darkolivegreen1" "darkolivegreen2" ## [88] "darkolivegreen3" "darkolivegreen4" "darkorange" ## [91] "darkorange1" "darkorange2" "darkorange3" ## [94] "darkorange4" "darkorchid" "darkorchid1" ## [97] "darkorchid2" "darkorchid3" "darkorchid4" ## [100] "darkred" "darksalmon" "darkseagreen" ## [103] "darkseagreen1" "darkseagreen2" "darkseagreen3" ## [106] "darkseagreen4" "darkslateblue" "darkslategray" ## [109] "darkslategray1" "darkslategray2" "darkslategray3" ## [112] "darkslategray4" "darkslategrey" "darkturquoise" ## [115] "darkviolet" "deeppink" "deeppink1" ## [118] "deeppink2" "deeppink3" "deeppink4" ## [121] "deepskyblue" "deepskyblue1" "deepskyblue2" ## [124] "deepskyblue3" "deepskyblue4" "dimgray" ## [127] "dimgrey" "dodgerblue" "dodgerblue1" ## [130] "dodgerblue2" "dodgerblue3" "dodgerblue4" ## [133] "firebrick" "firebrick1" "firebrick2" ## [136] "firebrick3" "firebrick4" "floralwhite" ## [139] "forestgreen" "gainsboro" "ghostwhite" ## [142] "gold" "gold1" "gold2" ## [145] "gold3" "gold4" "goldenrod" ## [148] "goldenrod1" "goldenrod2" "goldenrod3" ## [151] "goldenrod4" "gray" "gray0" ## [154] "gray1" "gray2" "gray3" ## [157] "gray4" "gray5" "gray6" ## [160] "gray7" "gray8" "gray9" ## [163] "gray10" "gray11" "gray12" ## [166] "gray13" "gray14" "gray15" ## [169] "gray16" "gray17" "gray18" ## [172] "gray19" "gray20" "gray21" ## [175] "gray22" "gray23" "gray24" ## [178] "gray25" "gray26" "gray27" ## [181] "gray28" "gray29" "gray30" ## [184] "gray31" "gray32" "gray33" ## [187] "gray34" "gray35" "gray36" ## [190] "gray37" "gray38" "gray39" ## [193] "gray40" "gray41" "gray42" ## [196] "gray43" "gray44" "gray45" ## [199] "gray46" "gray47" "gray48" ## [202] "gray49" "gray50" "gray51" ## [205] "gray52" "gray53" "gray54" ## [208] "gray55" "gray56" "gray57" ## [211] "gray58" "gray59" "gray60" ## [214] "gray61" "gray62" "gray63" ## [217] "gray64" "gray65" "gray66" ## [220] "gray67" "gray68" "gray69" ## [223] "gray70" "gray71" "gray72" ## [226] "gray73" "gray74" "gray75" ## [229] "gray76" "gray77" "gray78" ## [232] "gray79" "gray80" "gray81" ## [235] "gray82" "gray83" "gray84" ## [238] "gray85" "gray86" "gray87" ## [241] "gray88" "gray89" "gray90" ## [244] "gray91" "gray92" "gray93" ## [247] "gray94" "gray95" "gray96" ## [250] "gray97" "gray98" "gray99" ## [253] "gray100" "green" "green1" ## [256] "green2" "green3" "green4" ## [259] "greenyellow" "grey" "grey0" ## [262] "grey1" "grey2" "grey3" ## [265] "grey4" "grey5" "grey6" ## [268] "grey7" "grey8" "grey9" ## [271] "grey10" "grey11" "grey12" ## [274] "grey13" "grey14" "grey15" ## [277] "grey16" "grey17" "grey18" ## [280] "grey19" "grey20" "grey21" ## [283] "grey22" "grey23" "grey24" ## [286] "grey25" "grey26" "grey27" ## [289] "grey28" "grey29" "grey30" ## [292] "grey31" "grey32" "grey33" ## [295] "grey34" "grey35" "grey36" ## [298] "grey37" "grey38" "grey39" ## [301] "grey40" "grey41" "grey42" ## [304] "grey43" "grey44" "grey45" ## [307] "grey46" "grey47" "grey48" ## [310] "grey49" "grey50" "grey51" ## [313] "grey52" "grey53" "grey54" ## [316] "grey55" "grey56" "grey57" ## [319] "grey58" "grey59" "grey60" ## [322] "grey61" "grey62" "grey63" ## [325] "grey64" "grey65" "grey66" ## [328] "grey67" "grey68" "grey69" ## [331] "grey70" "grey71" "grey72" ## [334] "grey73" "grey74" "grey75" ## [337] "grey76" "grey77" "grey78" ## [340] "grey79" "grey80" "grey81" ## [343] "grey82" "grey83" "grey84" ## [346] "grey85" "grey86" "grey87" ## [349] "grey88" "grey89" "grey90" ## [352] "grey91" "grey92" "grey93" ## [355] "grey94" "grey95" "grey96" ## [358] "grey97" "grey98" "grey99" ## [361] "grey100" "honeydew" "honeydew1" ## [364] "honeydew2" "honeydew3" "honeydew4" ## [367] "hotpink" "hotpink1" "hotpink2" ## [370] "hotpink3" "hotpink4" "indianred" ## [373] "indianred1" "indianred2" "indianred3" ## [376] "indianred4" "ivory" "ivory1" ## [379] "ivory2" "ivory3" "ivory4" ## [382] "khaki" "khaki1" "khaki2" ## [385] "khaki3" "khaki4" "lavender" ## [388] "lavenderblush" "lavenderblush1" "lavenderblush2" ## [391] "lavenderblush3" "lavenderblush4" "lawngreen" ## [394] "lemonchiffon" "lemonchiffon1" "lemonchiffon2" ## [397] "lemonchiffon3" "lemonchiffon4" "lightblue" ## [400] "lightblue1" "lightblue2" "lightblue3" ## [403] "lightblue4" "lightcoral" "lightcyan" ## [406] "lightcyan1" "lightcyan2" "lightcyan3" ## [409] "lightcyan4" "lightgoldenrod" "lightgoldenrod1" ## [412] "lightgoldenrod2" "lightgoldenrod3" "lightgoldenrod4" ## [415] "lightgoldenrodyellow" "lightgray" "lightgreen" ## [418] "lightgrey" "lightpink" "lightpink1" ## [421] "lightpink2" "lightpink3" "lightpink4" ## [424] "lightsalmon" "lightsalmon1" "lightsalmon2" ## [427] "lightsalmon3" "lightsalmon4" "lightseagreen" ## [430] "lightskyblue" "lightskyblue1" "lightskyblue2" ## [433] "lightskyblue3" "lightskyblue4" "lightslateblue" ## [436] "lightslategray" "lightslategrey" "lightsteelblue" ## [439] "lightsteelblue1" "lightsteelblue2" "lightsteelblue3" ## [442] "lightsteelblue4" "lightyellow" "lightyellow1" ## [445] "lightyellow2" "lightyellow3" "lightyellow4" ## [448] "limegreen" "linen" "magenta" ## [451] "magenta1" "magenta2" "magenta3" ## [454] "magenta4" "maroon" "maroon1" ## [457] "maroon2" "maroon3" "maroon4" ## [460] "mediumaquamarine" "mediumblue" "mediumorchid" ## [463] "mediumorchid1" "mediumorchid2" "mediumorchid3" ## [466] "mediumorchid4" "mediumpurple" "mediumpurple1" ## [469] "mediumpurple2" "mediumpurple3" "mediumpurple4" ## [472] "mediumseagreen" "mediumslateblue" "mediumspringgreen" ## [475] "mediumturquoise" "mediumvioletred" "midnightblue" ## [478] "mintcream" "mistyrose" "mistyrose1" ## [481] "mistyrose2" "mistyrose3" "mistyrose4" ## [484] "moccasin" "navajowhite" "navajowhite1" ## [487] "navajowhite2" "navajowhite3" "navajowhite4" ## [490] "navy" "navyblue" "oldlace" ## [493] "olivedrab" "olivedrab1" "olivedrab2" ## [496] "olivedrab3" "olivedrab4" "orange" ## [499] "orange1" "orange2" "orange3" ## [502] "orange4" "orangered" "orangered1" ## [505] "orangered2" "orangered3" "orangered4" ## [508] "orchid" "orchid1" "orchid2" ## [511] "orchid3" "orchid4" "palegoldenrod" ## [514] "palegreen" "palegreen1" "palegreen2" ## [517] "palegreen3" "palegreen4" "paleturquoise" ## [520] "paleturquoise1" "paleturquoise2" "paleturquoise3" ## [523] "paleturquoise4" "palevioletred" "palevioletred1" ## [526] "palevioletred2" "palevioletred3" "palevioletred4" ## [529] "papayawhip" "peachpuff" "peachpuff1" ## [532] "peachpuff2" "peachpuff3" "peachpuff4" ## [535] "peru" "pink" "pink1" ## [538] "pink2" "pink3" "pink4" ## [541] "plum" "plum1" "plum2" ## [544] "plum3" "plum4" "powderblue" ## [547] "purple" "purple1" "purple2" ## [550] "purple3" "purple4" "red" ## [553] "red1" "red2" "red3" ## [556] "red4" "rosybrown" "rosybrown1" ## [559] "rosybrown2" "rosybrown3" "rosybrown4" ## [562] "royalblue" "royalblue1" "royalblue2" ## [565] "royalblue3" "royalblue4" "saddlebrown" ## [568] "salmon" "salmon1" "salmon2" ## [571] "salmon3" "salmon4" "sandybrown" ## [574] "seagreen" "seagreen1" "seagreen2" ## [577] "seagreen3" "seagreen4" "seashell" ## [580] "seashell1" "seashell2" "seashell3" ## [583] "seashell4" "sienna" "sienna1" ## [586] "sienna2" "sienna3" "sienna4" ## [589] "skyblue" "skyblue1" "skyblue2" ## [592] "skyblue3" "skyblue4" "slateblue" ## [595] "slateblue1" "slateblue2" "slateblue3" ## [598] "slateblue4" "slategray" "slategray1" ## [601] "slategray2" "slategray3" "slategray4" ## [604] "slategrey" "snow" "snow1" ## [607] "snow2" "snow3" "snow4" ## [610] "springgreen" "springgreen1" "springgreen2" ## [613] "springgreen3" "springgreen4" "steelblue" ## [616] "steelblue1" "steelblue2" "steelblue3" ## [619] "steelblue4" "tan" "tan1" ## [622] "tan2" "tan3" "tan4" ## [625] "thistle" "thistle1" "thistle2" ## [628] "thistle3" "thistle4" "tomato" ## [631] "tomato1" "tomato2" "tomato3" ## [634] "tomato4" "turquoise" "turquoise1" ## [637] "turquoise2" "turquoise3" "turquoise4" ## [640] "violet" "violetred" "violetred1" ## [643] "violetred2" "violetred3" "violetred4" ## [646] "wheat" "wheat1" "wheat2" ## [649] "wheat3" "wheat4" "whitesmoke" ## [652] "yellow" "yellow1" "yellow2" ## [655] "yellow3" "yellow4" "yellowgreen" ``` --- ### Change the Color ```r ggplot(data = biketown, mapping = aes(x = Month, fill = PaymentPlan)) + geom_bar(position = "fill") + scale_fill_manual(name = "User Type", values = c("violetred2", "steelblue3")) ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-21-1.png" width="360" /> --- ### Use a Different Theme ```r ggplot(data = biketown, mapping = aes(x = Month, fill = PaymentPlan)) + geom_bar(position = "fill") + scale_fill_manual(name = "User Type", values = c("violetred2", "steelblue3")) + theme_bw() ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-22-1.png" width="360" /> --- **Student question**: What is the difference between `geom_point()` and `geom_jitter()`? * Jittering the points is another strategy to help with overplotting. + It adds a little noise to each point. ```r biketown_may1_2020 <- filter(biketown, StartDate == "5/1/2020") # Original plot ggplot(data = biketown_may1_2020, mapping = aes(x = Duration, y = Distance_Miles)) + geom_point() ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-23-1.png" width="360" /> --- # `geom_jitter()` ```r # Original plot ggplot(data = biketown_may1_2020, mapping = aes(x = Duration, y = Distance_Miles)) + geom_point() # Jittered plot ggplot(data = biketown_may1_2020, mapping = aes(x = Duration, y = Distance_Miles)) + geom_jitter() ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-24-1.png" width="360" /><img src="wk02_fri_files/figure-html/unnamed-chunk-24-2.png" width="360" /> --- # `geom_jitter()` ```r # Let's cook up an example where jittering helps rounded_dat <- mutate(biketown_may1_2020, Distance_Miles_round = round(Distance_Miles, -1)) ggplot(data = rounded_dat, mapping = aes(x = Duration, y = Distance_Miles_round)) + geom_point() ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-25-1.png" width="360" /> --- # `geom_jitter()` ```r ggplot(data = rounded_dat, mapping = aes(x = Duration, y = Distance_Miles_round)) + geom_point() ggplot(data = rounded_dat, mapping = aes(x = Duration, y = Distance_Miles_round)) + geom_jitter(width = 0) ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-26-1.png" width="360" /><img src="wk02_fri_files/figure-html/unnamed-chunk-26-2.png" width="360" /> --- # Another `geom_jitter()` ```r ggplot(data = biketown_may1_2020, mapping = aes(x = PaymentPlan, y = Distance_Miles)) + geom_boxplot() + coord_cartesian(ylim = c(0,8)) + geom_point(color = "steelblue", alpha = 0.4) ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-27-1.png" width="360" /> --- # Another `geom_jitter()` ```r ggplot(data = biketown_may1_2020, mapping = aes(x = PaymentPlan, y = Distance_Miles)) + geom_boxplot() + coord_cartesian(ylim = c(0,8)) + geom_jitter(color = "steelblue", height = 0, width = 0.1, alpha = 0.4) ``` <img src="wk02_fri_files/figure-html/unnamed-chunk-28-1.png" width="360" />