class: center, middle

### Two Routes for Statistical Inference

.large[Kelly McConville | Math 141 | Week 11 | Fall 2020]

---

## Announcements/Reminders

* Still time to sign up for the Data Viz Contest!

---

## Week 11 Topics

* Practice with Statistical Inference via Probability Models
* Chi-Squared Test

*********************************************

### Goals for Today

* Two Routes to Statistical Inference

* See examples of statistical inference via probability models
    + Will explore some of the common test statistics and CIs [on the course website](https://reed-statistics.github.io/math141f20/inference_procedures.html)

---

### Recap: Hypothesis Testing

---

### Recap: Confidence Intervals

---

###  Examples

* Saw how to use `prop.test()` to do inference on a single proportion last class.
    + Also called a 1 sample z-test
    
--

* Let's explore how to do inference for a single mean.

---

### Inference for a Single Mean

**Example:** *Are lakes in Florida more acidic or alkaline?* The pH of a liquid is the measure of its acidity or alkalinity where pure water has a pH of 7, a pH greater than 7 is alkaline and a pH less than 7 is acidic. The following dataset contains observations on 53 lakes in Florida.  Use these data to answer our question.

```r
library(tidyverse)
FloridaLakes <- read_csv("/home/courses/math141f18/Data/FloridaLakes.csv")
```

* **Cases**:
* **Variable of interest**:
* **Parameter of interest:**

---

### Inference for a Single Mean

---

### Inference for a Single Mean

```r
library(infer)

# Generate null distribution
null_dist <- FloridaLakes %>%
 specify(response = pH) %>%
 hypothesize(null = "point", mu = 7) %>%
 generate(reps = 1000, type = "bootstrap") %>%
 calculate(stat = "t")

#Compute obs stat
t_obs <- FloridaLakes %>%
 specify(response = pH) %>%
 calculate(stat = "t", mu = 7)
t_obs
```

```
## # A tibble: 1 x 1
## stat
## <dbl>
## 1 -2.31
```

---

```r
# Graph the null distribution
null_dist %>%
  visualize(bins = 30) +
  geom_vline(xintercept = t_obs$stat, color = "deeppink",
             size = 2) +
  geom_vline(xintercept = abs(t_obs$stat), color = "deeppink", 
             size = 2)
```

---

### Inference for a Single Mean

What probability function is a good approximation to the null distribution?

```r
# Graph the null distribution
null_dist %>%
  visualize(bins = 30, method = "both") +
  geom_vline(xintercept = t_obs$stat, color = "deeppink",
             size = 2) +
  geom_vline(xintercept = abs(t_obs$stat), color = "deeppink", 
             size = 2)
```

---

#### Using the generated null distribution:

```r
# Compute p-value
pvalue <- null_dist %>%
 get_p_value(obs_stat = t_obs, direction = "both")
pvalue
```

```
## # A tibble: 1 x 1
## p_value
## <dbl>
## 1 0.024
```

#### Using an approximate probability function:

```r
#Built-in Function
t.test(FloridaLakes$pH, mu = 7, conf.level = .90,
       alternative = "two.sided")
```

```
## 
## 	One Sample t-test
## 
## data:  FloridaLakes$pH
## t = -2.3, df = 52, p-value = 0.02
## alternative hypothesis: true mean is not equal to 7
## 90 percent confidence interval:
##  6.294 6.887
## sample estimates:
## mean of x 
##     6.591
```

---

class: inverse, center, middle

### Let's go through some more examples