class: center, middle

## Hypothesis Testing with `infer`

.large[Kelly McConville | Math 141 | Week 9 | Fall 2020]

---

## Announcements/Reminders

* Lab 7 due this week before your lab meeting.

---

## Week 9 Topics

* Testing Conjectures

*********************************************

### Goals for Today

* Hypothesis testing with `infer`

* Generating null distributions

* Decisions in a hypothesis test
    + Types of errors

* Practical significance and effect sizes

---

class: center, middle, inverse

### Let's see the ESP example (one more time) but now using `infer`.  The "hypothesisTestingFramework.Rmd" file can be found in the Handouts folder.

---

## Generating Null Distributions

**For a sample proportion:**

Steps:

1. Flip unfair coin (prop heads = 0.25) 329 times.
2. Compute proportion of heads.
3. Repeat 1 and 2 many times.

R code:

```r
null_dist <- esp %>%
 specify(response = guess, success = "correct") %>%
 hypothesize(null = "point", p = 0.25) %>%
 generate(reps = 1000, type = "simulate") %>%
 calculate(stat ="prop")
```

For different variable types, we need to move beyond using a coin to conceptualize the null distribution.

---

## Generating a Null Distribution

Let's return to the penguins and ask if flipper length varies, on average, by the sex of the penguin.

`\(H_o: \mu_F - \mu_M = 0\)`

`\(H_a: \mu_F - \mu_M \neq 0\)`

Need a null distribution for the difference in sample means.

**Question**: If I shuffle (permute) the `sex` column and then compute the difference in sample means, what do you expect the difference in sample means to equal?

```
## # A tibble: 333 x 2
## flipper_length_mm sex 
## <int> <fct> 
## 1 181 male 
## 2 186 female
## 3 195 female
## 4 193 female
## 5 190 male 
## 6 181 female
## 7 195 male 
## 8 182 female
## 9 191 male 
## 10 198 male 
## # … with 323 more rows
```

---

## Generating a Null Distribution

Let's return to the penguins and ask if flipper length varies, on average, by the sex of the penguin.

`\(H_o: \mu_F - \mu_M = 0\)`

`\(H_a: \mu_F - \mu_M \neq 0\)`

Need a null distribution for the difference in sample means.

Steps:

1. Permute/shuffle the `sex` column.
2. Compute the difference in sample means.
3. Repeat 1 and 2 many times.

Let's go back to the "hypothesisTestingFramework.Rmd" document and see how to implement this process with `infer`.

---

### Hypothesis Testing: Decisions, Decisions

Once you get to the end of a hypothesis test you make one of two decisions:

(1) P-value is small.

&rarr; I have evidence for `\(H_a\)`. Reject `\(H_o\)`.

(2) P-value is not small.

&rarr; I don't have evidence for `\(H_a\)`. Fail to reject `\(H_o\)`.

Sometimes we make the correct decision.  Sometimes we make a mistake.

---

### Hypothesis Testing: Decisions, Decisions

Let's create a table of potential outcomes.

`\(\alpha\)` = prob of Type I error **under repeated sampling** = prob reject `\(H_o\)` when it is true

`\(\beta\)` = prob of Type II error **under repeated sampling** = prob fail to reject `\(H_o\)` when `\(H_a\)` is true.

---

### Hypothesis Testing: Decisions, Decisions

Typically set `\(\alpha\)` level beforehand.

Use `\(\alpha\)` to determine "small" for a p-value.

(1) P-value ~~is~~ ~~small~~ `\(< \alpha\)`.

&rarr; I have evidence for `\(H_a\)`. Reject `\(H_o\)`.

(2) P-value ~~is~~ ~~not~~ ~~small~~  `\(\geq \alpha\)`.

&rarr; I don't have evidence for `\(H_a\)`. Fail to reject `\(H_o\)`.

---

### Hypothesis Testing: Decisions, Decisions

**Question**: How do I select `\(\alpha\)`?

* Will depend on the convention in your field.

* Want a small `\(\alpha\)` and a small `\(\beta\)`. But they are related.  
    + How?

**The smaller `\(\alpha\)` is the larger `\(\beta\)` will be.**

&rarr; Choose a lower `\(\alpha\)` (e.g., 0.01, 0.001) when the Type I error is worse and a higher `\(\alpha\)` (e.g., 0.1) when the Type II error is worse.

* Note: Can't easily compute `\(\beta\)`.  Why?