class: center, middle <img src="img/dplyr_wrangling.png" width="700px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 3 | Fall 2020] </span> --- ## Announcements * Invited to attend a talk I am giving about data related, team science research, entitled "Reed Forestry Data Science" + This Thursday, Sept 17th 4:45 - 5:30pm PT on Zoom + For Zoom link, see message in the #outside-stats channel of our Slack Workspace -- * Don't forget that Lab 2 is due before your lab session this week! -- * At the end of class, will go through the "WranglingData.Rmd" handout. Have three options: + Listen and take notes as I go through the handout + Print out PDF and take notes as I go through the handout (posted to Slack #in-class) + Run the code with me (grab handout from `/home/courses/math141f20/Handouts`) --- ## Week 3 Topics * Data summarization * **Data wrangling** * Data collection --- # Goals for Today * What is data wrangling? -- * Learning the main `dplyr` verbs -- * Introduce the Math 141 group project and project assignment 1. --- ## Math 141 Group Project * Goal: Practice working through the data analysis process with a real dataset and research question(s) -- * Structure: + Groups of 2-3 people based on lab section + Key due dates [here](https://reed-statistics.github.io/math141f20/due_dates.html) + Three intermediate assignments and a final 10 minute video presentation -- * Project Assignment 1 + Pick a data set and research questions + Explore and visualize the data to start answering the research questions + Due on Gradescope: Friday, October 2nd + Only one person from the group needs to turn it in. --- class: center, middle, inverse # What is data wrangling? --- ## Data Wrangling * **Data Wrangling**: Transformations done on the data -- **Why wrangle the data?** -- * EX: To compute summary statistics (saw this on Monday!) -- * EX: Remove missing values because modeling technique can't handle missing values. -- * EX: Only interested in a certain subset of the data (such as all bike rides on May 1, 2020) -- * EX: To collapse a categorical variable with 20 categories into a variable with 3 categories -- * EX: To sort the data to make it easier to display -- * EX: To remove extraneous variables ... --- class: center, middle, inverse # Now let's go through the Wrangling Data handout!