class: center, middle ### Inference Methods: Which is Better? <img src="img/DAW.png" width="450px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 12 | Fall 2020] </span> --- ## Announcements/Reminders * Extra Credit Assignment: Write a stats poem. + Due December 2nd * Lab This Week: + If have a Friday afternoon lab, can attend a TH or Friday morning session. + Can see the times at https://solar.reed.edu/class_schedule/ + MUST inform both lab instructors of which lab you are in and which one you will be attending. * Final Exam + Takehome and Orals: Dec 10th/11th --- ### Project Assignments * Project Assignment 3 due on Friday! + Feel free to include code used at the end of the document. * Final Project Assignment (pafinal.Rmd) is in the shared folder. + Creating a video presentation where you answer one of your research questions. + Due Wednesday, December 9th + Likely won't need to do any additional analyses. --- ## Week 12 Topics * ANOVA Test * Simulation Methods versus Probability Model Methods for Inference * Inference for Linear Regression ********************************************* ### Goals for Today * Finish up the ANOVA test discussion * A Comparison of Inference Methods --- ### The ANalysis Of VAriance Test Consider the situation where: * Response variable: quantitative * Explanatory variable: categorical `\(H_o\)`: `\(\mu_1 = \mu_2 = \cdots = \mu_K\)` (Variables are independent/not related.) `\(H_a\)`: At least one mean is not equal to the rest. (Variables are dependent/related) --- ### Example Do Audience Ratings vary by movie genre? <img src="wk12_wed_files/figure-html/unnamed-chunk-2-1.png" width="360" style="display: block; margin: auto;" /> --- ### Test Statistic Needs to measure the discrepancy between the observed sample and the sample we'd expect to see if `\(H_o\)` were true $$ F = \frac{\mbox{MSG}}{\mbox{MSE}} = \frac{\mbox{variance between groups}}{\mbox{variance within groups}} $$ If * There are at least 30 observations in each group or the response variable is normal * The variability is similar in all groups then $$ \mbox{test statistic} \sim F(df1 = K - 1, df2 = n - K) $$ when `\(H_o\)` is true. --- ### The ANOVA Test Check assumptions! ```r movies %>% group_by(Genre) %>% summarize(n(), sd(AudienceScore)) ``` ``` ## # A tibble: 7 x 3 ## Genre `n()` `sd(AudienceScore)` ## <fct> <int> <dbl> ## 1 Action 32 18.4 ## 2 Animation 12 13.9 ## 3 Comedy 27 15.7 ## 4 Drama 21 14.5 ## 5 Horror 17 15.9 ## 6 Romance 10 12.9 ## 7 Thriller 13 14.9 ``` --- ### The ANOVA Test Check assumptions! ```r ggplot(data = movies, mapping = aes(x = AudienceScore)) + geom_histogram(bins = 15) + facet_wrap(~Genre) ``` <img src="wk12_wed_files/figure-html/unnamed-chunk-4-1.png" width="576" style="display: block; margin: auto;" /> --- ### The ANOVA Test ```r library(broom) mod <- aov(AudienceScore ~ Genre, data = movies) tidy(mod) ``` ``` ## # A tibble: 2 x 6 ## term df sumsq meansq statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Genre 6 5855. 976. 3.88 0.00137 ## 2 Residuals 125 31413. 251. NA NA ``` --- ### Comparing Simulation-Based Methods and Probability Model-Based Methods <img src="img/hyp_testing_diagram.png" width="800px"/> **Question**: Which method is better? --- ### Comparing Simulation-Based Methods and Probability Model-Based Methods **Question**: Which method is better? --- #### Question: What is an exploratory graph? What is the point of an exploratory graph? <img src="img/DAW.png" width="80%" style="display: block; margin: auto;" /> --- #### Question: What is an exploratory graph? What is the point of an exploratory graph? <img src="wk12_wed_files/figure-html/unnamed-chunk-7-1.png" width="360" style="display: block; margin: auto;" /> <img src="wk12_wed_files/figure-html/unnamed-chunk-8-1.png" width="360" style="display: block; margin: auto;" /> --- #### Question: Why isn't a graph of the null distribution an exploratory graph? <img src="wk12_wed_files/figure-html/unnamed-chunk-9-1.png" width="360" style="display: block; margin: auto;" /> --- #### Question: When do I use `type = permute` versus `simulate` versus `bootstrap`? **Note**: All of these fall under the umbrella of simulation-based methods! + And they dictate *how* you generate the samples. ```r null_dist <- eye_data %>% specify(Eye ~ Lighting) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% calculate(stat = "Chisq") ``` -- **Estimation**: When generating a bootstrap distribution, use `type = bootstrap`. -- **Hypothesis Testing**: It depends on how you want to generate the null samples. + Let's go to the [Summary Tables](https://reed-statistics.github.io/math141f20/inference_procedures.html). --- #### Question: When do I use `type = permute` versus `simulate` versus `bootstrap`? Generating a null distribution for a single proportion: ```r null_dist <- esp %>% specify(response = guess, success = "correct") %>% hypothesize(null = "point", p = 0.25) %>% * generate(reps = 1000, type = "simulate") %>% calculate(stat ="prop") ``` Generating a null distribution for a single mean: ```r null_dist <- FloridaLakes %>% specify(response = pH) %>% hypothesize(null = "point", mu = 7) %>% * generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") ```