- Sort students by division, and within division, by year.
survey %>%
arrange(major_division, academic_year) %>%
select(major_division, academic_year, everything())
## # A tibble: 59 × 20
## major_division academic_year Timestamp height_cm applications
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Arts Sophomore 2/9/2022 7:… 160 4
## 2 History and Social Sciences Sophomore 2/8/2022 22… 180 20
## 3 History and Social Sciences Sophomore 2/8/2022 22… 157 3
## 4 History and Social Sciences Sophomore 2/9/2022 3:… 1.85 15
## 5 HSS First year 2/7/2022 16… 175. 2
## 6 HSS First year 2/7/2022 20… 170 14
## 7 HSS First year 2/8/2022 12… 169 8
## 8 HSS First year 2/8/2022 14… 182 3
## 9 HSS Sophomore 2/7/2022 10… 165 10
## 10 HSS Sophomore 2/7/2022 22… 170 14
## # … with 49 more rows, and 15 more variables: distance_home_miles <dbl>,
## # prior_stats <chr>, weekly_study_hours <dbl>, study_place <chr>,
## # social_views <dbl>, economic_views <dbl>, roommates <chr>,
## # six_month_books <dbl>, transportation <chr>, dog_pants <chr>, hotdog <chr>,
## # coffee_tea <chr>, bedtime <time>, diet <chr>, play_wordle <chr>
- Find the number of students in each year whose primary mode of transportation is walking.
survey %>%
group_by(academic_year, transportation) %>%
summarize(n = n()) %>%
filter(transportation == "Walk")
## `summarise()` has grouped output by 'academic_year'. You can override using the `.groups` argument.
## # A tibble: 4 × 3
## # Groups: academic_year [4]
## academic_year transportation n
## <chr> <chr> <int>
## 1 First year Walk 17
## 2 Junior Walk 4
## 3 Senior Walk 1
## 4 Sophomore Walk 19
- Calculate the median number of college applications submitted by students who play Wordle.
survey %>%
filter(play_wordle == "Yes") %>%
summarize(median_apps = median(applications))
## # A tibble: 1 × 1
## median_apps
## <dbl>
## 1 7
- Create a data set consisting only of categorical variables (ordered alphabetically), and with student responses ordered alphabetically, starting with the first variable.
survey %>%
select(academic_year, coffee_tea, diet, dog_pants, hotdog, major_division, prior_stats, play_wordle, study_place, transportation) %>%
arrange(academic_year, coffee_tea, diet, dog_pants, hotdog, major_division, prior_stats, play_wordle, study_place, transportation)
## # A tibble: 59 × 10
## academic_year coffee_tea diet dog_pants hotdog major_division prior_stats
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 First year Coffee None All four… No Mathematics and… Yes
## 2 First year Coffee None Back legs No HSS Yes
## 3 First year Coffee None Back legs No Interdisciplina… No
## 4 First year Coffee None Back legs No Mathematics and… Yes
## 5 First year Coffee None Back legs Yes Mathematics and… No
## 6 First year Coffee None Back legs Yes Mathematics and… Yes
## 7 First year Coffee Other All four… No MNS No
## 8 First year Coffee Veget… All four… No MNS No
## 9 First year Coffee Veget… Back legs Yes HSS No
## 10 First year Neither None Back legs No Interdisciplina… Yes
## # … with 49 more rows, and 3 more variables: play_wordle <chr>,
## # study_place <chr>, transportation <chr>
- Identify students whose social and economic views differ by 2 or more points.
survey %>%
mutate(diff = abs(social_views - economic_views)) %>%
select(social_views, economic_views, diff) %>%
filter(diff > 2)
## # A tibble: 0 × 3
## # … with 3 variables: social_views <dbl>, economic_views <dbl>, diff <dbl>
- Create a data set consisting of two variables: Diet and Height (in inches)
survey %>%
mutate(height_in = height_cm/2.54) %>%
select(diet, height_in)
## # A tibble: 59 × 2
## diet height_in
## <chr> <dbl>
## 1 None 65.0
## 2 Fish allergy 70.1
## 3 Vegetarian 68.9
## 4 Pescatarian 65
## 5 None 70
## 6 None 65.4
## 7 None 69
## 8 None 66
## 9 None 62.6
## 10 Vegetarian 66.9
## # … with 49 more rows
- Count how many students think both that dogs should wear pants on their back legs and that hot dogs are sandwiches.
survey %>%
filter(hotdog=="Yes") %>%
filter(dog_pants == "Back legs") %>%
summarize(how_many = n())
## # A tibble: 1 × 1
## how_many
## <int>
## 1 15
- Among students who drink coffee and whose hometime is at least 100 miles from Reed, create a data set that could be used make a scatterplot of bedtime vs. weekly hours spent studying.
survey %>%
filter(coffee_tea == "Coffee") %>%
filter(distance_home_miles > 100) %>%
select(bedtime, weekly_study_hours) %>%
ggplot(aes(x = bedtime, y = weekly_study_hours))+
geom_jitter()
