class: center, middle ## Hypothesis Testing with `infer` <img src="img/DAW.png" width="450px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 9 | Fall 2020] </span> --- ## Announcements/Reminders * Lab 7 due this week before your lab meeting. --- ## Week 9 Topics * Testing Conjectures ********************************************* ### Goals for Today * Hypothesis testing with `infer` * Generating null distributions * Decisions in a hypothesis test + Types of errors * Practical significance and effect sizes --- class: center, middle, inverse ### Let's see the ESP example (one more time) but now using `infer`. The "hypothesisTestingFramework.Rmd" file can be found in the Handouts folder. --- ## Generating Null Distributions **For a sample proportion:** Steps: 1. Flip unfair coin (prop heads = 0.25) 329 times. 2. Compute proportion of heads. 3. Repeat 1 and 2 many times. R code: ```r null_dist <- esp %>% specify(response = guess, success = "correct") %>% hypothesize(null = "point", p = 0.25) %>% generate(reps = 1000, type = "simulate") %>% calculate(stat ="prop") ``` -- For different variable types, we need to move beyond using a coin to conceptualize the null distribution. --- ## Generating a Null Distribution Let's return to the penguins and ask if flipper length varies, on average, by the sex of the penguin. -- `\(H_o: \mu_F - \mu_M = 0\)` `\(H_a: \mu_F - \mu_M \neq 0\)` -- Need a null distribution for the difference in sample means. -- **Question**: If I shuffle (permute) the `sex` column and then compute the difference in sample means, what do you expect the difference in sample means to equal? ``` ## # A tibble: 333 x 2 ## flipper_length_mm sex ## <int> <fct> ## 1 181 male ## 2 186 female ## 3 195 female ## 4 193 female ## 5 190 male ## 6 181 female ## 7 195 male ## 8 182 female ## 9 191 male ## 10 198 male ## # … with 323 more rows ``` --- ## Generating a Null Distribution Let's return to the penguins and ask if flipper length varies, on average, by the sex of the penguin. -- `\(H_o: \mu_F - \mu_M = 0\)` `\(H_a: \mu_F - \mu_M \neq 0\)` -- Need a null distribution for the difference in sample means. Steps: 1. Permute/shuffle the `sex` column. 2. Compute the difference in sample means. 3. Repeat 1 and 2 many times. Let's go back to the "hypothesisTestingFramework.Rmd" document and see how to implement this process with `infer`. --- ### Hypothesis Testing: Decisions, Decisions Once you get to the end of a hypothesis test you make one of two decisions: -- (1) P-value is small. → I have evidence for `\(H_a\)`. Reject `\(H_o\)`. -- (2) P-value is not small. → I don't have evidence for `\(H_a\)`. Fail to reject `\(H_o\)`. -- Sometimes we make the correct decision. Sometimes we make a mistake. --- ### Hypothesis Testing: Decisions, Decisions Let's create a table of potential outcomes. <br> <br><br><br><br><br><br><br><br> <br> <br> <br> `\(\alpha\)` = prob of Type I error **under repeated sampling** = prob reject `\(H_o\)` when it is true -- `\(\beta\)` = prob of Type II error **under repeated sampling** = prob fail to reject `\(H_o\)` when `\(H_a\)` is true. --- ### Hypothesis Testing: Decisions, Decisions Typically set `\(\alpha\)` level beforehand. -- Use `\(\alpha\)` to determine "small" for a p-value. -- (1) P-value ~~is~~ ~~small~~ `\(< \alpha\)`. → I have evidence for `\(H_a\)`. Reject `\(H_o\)`. (2) P-value ~~is~~ ~~not~~ ~~small~~ `\(\geq \alpha\)`. → I don't have evidence for `\(H_a\)`. Fail to reject `\(H_o\)`. --- ### Hypothesis Testing: Decisions, Decisions **Question**: How do I select `\(\alpha\)`? -- * Will depend on the convention in your field. -- * Want a small `\(\alpha\)` and a small `\(\beta\)`. But they are related. + How? -- **The smaller `\(\alpha\)` is the larger `\(\beta\)` will be.** -- → Choose a lower `\(\alpha\)` (e.g., 0.01, 0.001) when the Type I error is worse and a higher `\(\alpha\)` (e.g., 0.1) when the Type II error is worse. -- * Note: Can't easily compute `\(\beta\)`. Why?