Math 141 Survey Wrangling 2

Sort students by division, and within division, by year.

survey %>% 
  arrange(major_division, academic_year) %>% 
  select(major_division, academic_year, everything())

## # A tibble: 59 × 20
##    major_division              academic_year Timestamp    height_cm applications
##    <chr>                       <chr>         <chr>            <dbl>        <dbl>
##  1 Arts                        Sophomore     2/9/2022 7:…    160               4
##  2 History and Social Sciences Sophomore     2/8/2022 22…    180              20
##  3 History and Social Sciences Sophomore     2/8/2022 22…    157               3
##  4 History and Social Sciences Sophomore     2/9/2022 3:…      1.85           15
##  5 HSS                         First year    2/7/2022 16…    175.              2
##  6 HSS                         First year    2/7/2022 20…    170              14
##  7 HSS                         First year    2/8/2022 12…    169               8
##  8 HSS                         First year    2/8/2022 14…    182               3
##  9 HSS                         Sophomore     2/7/2022 10…    165              10
## 10 HSS                         Sophomore     2/7/2022 22…    170              14
## # … with 49 more rows, and 15 more variables: distance_home_miles <dbl>,
## #   prior_stats <chr>, weekly_study_hours <dbl>, study_place <chr>,
## #   social_views <dbl>, economic_views <dbl>, roommates <chr>,
## #   six_month_books <dbl>, transportation <chr>, dog_pants <chr>, hotdog <chr>,
## #   coffee_tea <chr>, bedtime <time>, diet <chr>, play_wordle <chr>

Find the number of students in each year whose primary mode of transportation is walking.

survey %>% 
  group_by(academic_year, transportation) %>% 
  summarize(n = n()) %>% 
  filter(transportation == "Walk")

## `summarise()` has grouped output by 'academic_year'. You can override using the `.groups` argument.

## # A tibble: 4 × 3
## # Groups:   academic_year [4]
##   academic_year transportation     n
##   <chr>         <chr>          <int>
## 1 First year    Walk              17
## 2 Junior        Walk               4
## 3 Senior        Walk               1
## 4 Sophomore     Walk              19

Calculate the median number of college applications submitted by students who play Wordle.

survey %>% 
  filter(play_wordle == "Yes") %>% 
  summarize(median_apps = median(applications))

## # A tibble: 1 × 1
##   median_apps
##         <dbl>
## 1           7

Create a data set consisting only of categorical variables (ordered alphabetically), and with student responses ordered alphabetically, starting with the first variable.

survey %>% 
  select(academic_year, coffee_tea, diet, dog_pants, hotdog, major_division, prior_stats, play_wordle, study_place, transportation) %>% 
  arrange(academic_year, coffee_tea, diet, dog_pants, hotdog, major_division, prior_stats, play_wordle, study_place, transportation)

## # A tibble: 59 × 10
##    academic_year coffee_tea diet   dog_pants hotdog major_division   prior_stats
##    <chr>         <chr>      <chr>  <chr>     <chr>  <chr>            <chr>      
##  1 First year    Coffee     None   All four… No     Mathematics and… Yes        
##  2 First year    Coffee     None   Back legs No     HSS              Yes        
##  3 First year    Coffee     None   Back legs No     Interdisciplina… No         
##  4 First year    Coffee     None   Back legs No     Mathematics and… Yes        
##  5 First year    Coffee     None   Back legs Yes    Mathematics and… No         
##  6 First year    Coffee     None   Back legs Yes    Mathematics and… Yes        
##  7 First year    Coffee     Other  All four… No     MNS              No         
##  8 First year    Coffee     Veget… All four… No     MNS              No         
##  9 First year    Coffee     Veget… Back legs Yes    HSS              No         
## 10 First year    Neither    None   Back legs No     Interdisciplina… Yes        
## # … with 49 more rows, and 3 more variables: play_wordle <chr>,
## #   study_place <chr>, transportation <chr>

Identify students whose social and economic views differ by 2 or more points.

survey %>% 
  mutate(diff = abs(social_views - economic_views)) %>% 
  select(social_views, economic_views, diff) %>% 
  filter(diff > 2)

## # A tibble: 0 × 3
## # … with 3 variables: social_views <dbl>, economic_views <dbl>, diff <dbl>

Create a data set consisting of two variables: Diet and Height (in inches)

survey %>% 
  mutate(height_in = height_cm/2.54) %>% 
  select(diet, height_in)

## # A tibble: 59 × 2
##    diet         height_in
##    <chr>            <dbl>
##  1 None              65.0
##  2 Fish allergy      70.1
##  3 Vegetarian        68.9
##  4 Pescatarian       65  
##  5 None              70  
##  6 None              65.4
##  7 None              69  
##  8 None              66  
##  9 None              62.6
## 10 Vegetarian        66.9
## # … with 49 more rows

Count how many students think both that dogs should wear pants on their back legs and that hot dogs are sandwiches.

survey %>% 
  filter(hotdog=="Yes") %>% 
  filter(dog_pants == "Back legs") %>% 
  summarize(how_many = n())

## # A tibble: 1 × 1
##   how_many
##      <int>
## 1       15

Among students who drink coffee and whose hometime is at least 100 miles from Reed, create a data set that could be used make a scatterplot of bedtime vs. weekly hours spent studying.

survey %>% 
  filter(coffee_tea == "Coffee") %>% 
  filter(distance_home_miles > 100) %>% 
  select(bedtime, weekly_study_hours) %>% 
  ggplot(aes(x = bedtime, y = weekly_study_hours))+
  geom_jitter()

Math 141 Survey Wrangling 2

Nate Wells

2/9/2022