Math 141 Survey Wrangling

Sort students by Hogwarts House, and within house, by year.

survey %>% 
  arrange(hogwarts, year) %>% 
  select(hogwarts, year, everything())

## # A tibble: 103 × 34
##    hogwarts   year        X1 social economic diet      college_app reedie_social
##    <chr>      <chr>    <dbl>  <dbl>    <dbl> <chr>           <dbl>         <dbl>
##  1 don't know Senior      67      5        6 Neither             6             2
##  2 Gryffindor Freshman    19      5        7 Neither             3             3
##  3 Gryffindor Freshman    41      1        2 Vegan               1             3
##  4 Gryffindor Freshman    42      8        5 Neither            22             2
##  5 Gryffindor Freshman    63      2        2 Neither            12             2
##  6 Gryffindor Freshman    99      6        4 Neither            20             2
##  7 Gryffindor Freshman   103      3        3 Vegetari…           6             2
##  8 Gryffindor Junior      17      3        1 Neither            12             3
##  9 Gryffindor Junior      65      2        5 Vegetari…           8             3
## 10 Gryffindor Junior      90      4        6 Vegetari…           5             4
## # … with 93 more rows, and 26 more variables: reedie_economic <dbl>,
## #   study <chr>, commons <chr>, transportation <chr>, division <chr>,
## #   tradition <chr>, awkward <chr>, technology <chr>, historian <chr>,
## #   alcohol <dbl>, reedie_alcohol <dbl>, marijuana <dbl>,
## #   reedie_marijuana <dbl>, social_media <chr>, coffee_tea <chr>,
## #   computer <chr>, season <chr>, thai <chr>, ac <chr>, beach_mountain <chr>,
## #   donut <chr>, first_kiss <dbl>, meme <chr>, dog_pants <chr>, …

Find the number of students in each year who think hot dogs are sandwiches.

survey %>% 
  group_by(year, hot_dog) %>% 
  summarize(n = n()) %>% 
  filter(hot_dog == "Yes")

## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

## # A tibble: 4 × 3
## # Groups:   year [4]
##   year      hot_dog     n
##   <chr>     <chr>   <int>
## 1 Freshman  Yes         7
## 2 Junior    Yes         6
## 3 Senior    Yes         3
## 4 Sophomore Yes        13

Calculate the median number of college applications submitted by students of Herodotus.

survey %>% 
  filter(historian == "Herodotus") %>% 
  summarize(median_apps = median(college_app))

## # A tibble: 1 × 1
##   median_apps
##         <dbl>
## 1           7

Create a data set consisting only of categorical variables (ordered alphabetically), and with student responses ordered alphabetically, starting with the first variable.

survey %>% 
  select(dog_pants, historian,  hogwarts,  hot_dog, year) %>% 
  arrange(dog_pants, historian,  hogwarts,  hot_dog, year)

## # A tibble: 103 × 5
##    dog_pants historian  hogwarts   hot_dog year     
##    <chr>     <chr>      <chr>      <chr>   <chr>    
##  1 All legs  don't know don't know Maybe   Senior   
##  2 All legs  Herodotus  Gryffindor Maybe   Junior   
##  3 All legs  Herodotus  Gryffindor Maybe   Sophomore
##  4 All legs  Herodotus  Gryffindor No      Freshman 
##  5 All legs  Herodotus  Gryffindor No      Sophomore
##  6 All legs  Herodotus  Gryffindor Yes     Sophomore
##  7 All legs  Herodotus  Hufflepuff Yes     Junior   
##  8 All legs  Herodotus  Ravenclaw  No      Freshman 
##  9 All legs  Herodotus  Ravenclaw  No      Junior   
## 10 All legs  Herodotus  Ravenclaw  No      Sophomore
## # … with 93 more rows

Identify students whose social and economic views differ by 2 or more points.

survey %>% 
  mutate(diff = abs(social - economic)) %>% 
  select(social, economic, diff) %>% 
  filter(diff > 2)

## # A tibble: 18 × 3
##    social economic  diff
##     <dbl>    <dbl> <dbl>
##  1      1        6     5
##  2      3        8     5
##  3      3        6     3
##  4      4        8     4
##  5      2        5     3
##  6      8        5     3
##  7      3        7     4
##  8      2        5     3
##  9      3        9     6
## 10      6        9     3
## 11      1        4     3
## 12      1        5     4
## 13      4        7     3
## 14      3        7     4
## 15      5        2     3
## 16      2        5     3
## 17      3        6     3
## 18      8        5     3

Create a data set consisting of two variables: Hogwarts House and Political Views, where the Political Views score is obtained by averaging a student’s Social and Economic views scores.

survey %>% 
  mutate(poli_view = (social+economic)/2) %>% 
  select(hogwarts, poli_view)

## # A tibble: 103 × 2
##    hogwarts   poli_view
##    <chr>          <dbl>
##  1 Ravenclaw        4.5
##  2 Ravenclaw        3  
##  3 Gryffindor       5  
##  4 Hufflepuff       4  
##  5 Slytherin        3  
##  6 Slytherin        4  
##  7 Ravenclaw        3.5
##  8 Slytherin        1  
##  9 Hufflepuff       2  
## 10 Gryffindor       3  
## # … with 93 more rows

Count how many students think both that dogs should wear pants on their back legs and that hot dogs are sandwiches.

survey %>% 
  filter(hot_dog=="Yes") %>% 
  filter(dog_pants == "Back legs") %>% 
  summarize(how_many = n())

## # A tibble: 1 × 1
##   how_many
##      <int>
## 1       22

Among students who drink who are not freshmen, create a data set that could be used make a scatterplot of alcohol use vs. social views.

survey %>% 
  filter(alcohol > 0) %>% 
  filter(year != "Freshman") %>% 
  select(alcohol, social) %>% 
  ggplot(aes(x = alcohol, y = social))+
  geom_jitter()

Math 141 Survey Wrangling

Nate Wells

2/8/2022