class: center, middle

### Inference Methods: Which is Better?

<span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 12 | Fall 2020] </span>

---

## Announcements/Reminders

* Extra Credit Assignment: Write a stats poem.
    + Due December 2nd
    
* Lab This Week:
    + If have a Friday afternoon lab, can attend a TH or Friday morning session.
    + Can see the times at https://solar.reed.edu/class_schedule/
    + MUST inform both lab instructors of which lab you are in and which one you will be attending.

* Final Exam
    + Takehome and Orals: Dec 10th/11th

---

### Project Assignments

* Project Assignment 3 due on Friday!
    + Feel free to include code used at the end of the document.
    
* Final Project Assignment (pafinal.Rmd) is in the shared folder.
    + Creating a video presentation where you answer one of your research questions.
    + Due Wednesday, December 9th
    + Likely won't need to do any additional analyses.

---

## Week 12 Topics

* ANOVA Test
* Simulation Methods versus Probability Model Methods for Inference
* Inference for Linear Regression

*********************************************

### Goals for Today

* Finish up the ANOVA test discussion

* A Comparison of Inference Methods

---

### The ANalysis Of VAriance Test

Consider the situation where:

* Response variable: quantitative

* Explanatory variable: categorical

`$H_o$`: `$\mu_1 = \mu_2 = \cdots = \mu_K$`  (Variables are independent/not related.)

`$H_a$`: At least one mean is not equal to the rest. (Variables are dependent/related)

---

### Example

Do Audience Ratings vary by movie genre?

---

### Test Statistic

Needs to measure the discrepancy between the observed sample and the sample we'd expect to see if `$H_o$` were true

$$
F = \frac{\mbox{MSG}}{\mbox{MSE}} = \frac{\mbox{variance between groups}}{\mbox{variance within groups}}
$$

* There are at least 30 observations in each group or the response variable is normal
* The variability is similar in all groups

then

$$
\mbox{test statistic} \sim F(df1 = K - 1, df2 = n - K)
$$

when `$H_o$` is true.

---

### The ANOVA Test

Check assumptions!

```r
movies %>%
  group_by(Genre) %>%
  summarize(n(), sd(AudienceScore))
```

```
## # A tibble: 7 x 3
##   Genre     `n()` `sd(AudienceScore)`
##   <fct>     <int>               <dbl>
## 1 Action       32                18.4
## 2 Animation    12                13.9
## 3 Comedy       27                15.7
## 4 Drama        21                14.5
## 5 Horror       17                15.9
## 6 Romance      10                12.9
## 7 Thriller     13                14.9
```

---

### The ANOVA Test

Check assumptions!

```r
ggplot(data = movies, mapping = aes(x = AudienceScore)) + 
  geom_histogram(bins = 15) + 
  facet_wrap(~Genre)
```

---

### The ANOVA Test

```r
library(broom)
mod <- aov(AudienceScore ~ Genre, data = movies)
tidy(mod)
```

```
## # A tibble: 2 x 6
##   term         df  sumsq meansq statistic  p.value
##   <chr>     <dbl>  <dbl>  <dbl>     <dbl>    <dbl>
## 1 Genre         6  5855.   976.      3.88  0.00137
## 2 Residuals   125 31413.   251.     NA    NA
```

---

### Comparing Simulation-Based Methods and Probability Model-Based Methods

**Question**: Which method is better?

---

### Comparing Simulation-Based Methods and Probability Model-Based Methods

**Question**: Which method is better?

---

#### Question: What is an exploratory graph?  What is the point of an exploratory graph?

---

#### Question: What is an exploratory graph?  What is the point of an exploratory graph?

---

#### Question: Why isn't a graph of the null distribution an exploratory graph?

---

#### Question: When do I use `type = permute` versus `simulate` versus `bootstrap`?

**Note**: All of these fall under the umbrella of simulation-based methods!

+ And they dictate *how* you generate the samples.

```r
null_dist <- eye_data %>%
  specify(Eye ~ Lighting) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "Chisq")
```

**Estimation**: When generating a bootstrap distribution, use `type = bootstrap`.

**Hypothesis Testing**:  It depends on how you want to generate the null samples.

+ Let's go to the [Summary Tables](https://reed-statistics.github.io/math141f20/inference_procedures.html).

---

#### Question: When do I use `type = permute` versus `simulate` versus `bootstrap`?

Generating a null distribution for a single proportion:

```r
null_dist <- esp %>%
  specify(response = guess, success = "correct") %>%
  hypothesize(null = "point", p = 0.25) %>%
* generate(reps = 1000, type = "simulate") %>%
  calculate(stat ="prop")
```

Generating a null distribution for a single mean:

```r
null_dist <- FloridaLakes %>%
  specify(response = pH) %>%
  hypothesize(null = "point", mu = 7) %>%
* generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")
```