class: center, middle

### Inference Examples and Probability Calculations

<span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 11 | Fall 2020] </span>

---

## Announcements/Reminders

Spring Schedule is updated.

&rarr; Time to talk about that next stats class.

If you want to build more flexible models:

&rarr; Take [Math 243: Statistical Learning](https://reed-stat-learning-fall-2020.github.io/syllabus.html) in the fall.

If you want to take your data wrangling, data viz, and R skills to the next level:

&rarr; Take [Math 241: Data Science](https://www.reed.edu/math/241/post/).

This spring's offerings of Math 241:

* T/TH 8:50 - 10:10 am (Online)
* T/TH 10:25 - 11:45 am (Online)

If you want to prove some of the results we have relied on in this class:

&rarr; Take **Math 391: Probability Theory** in the fall. It has additional pre-reqs.

---

## Week 11 Topics

* Practice with Statistical Inference via Probability Models
* Chi-Squared Test

*********************************************

### Goals for Today

* Probability model calculations in R

* Motivate sample size calculations

* Paired data

* See more inference examples

---

### Probability Calculations in R

P% CI for parameter:

$$
\mbox{statistic} \pm z^* SE
$$

**Question**: How do I find the correct critical values `$(z^* \mbox{ or } t^*)$` for the confidence interval?

---

P% CI for parameter:

$$
\mbox{statistic} \pm z^* SE
$$

**Question**: How do I find the correct critical values `$(z^* \mbox{ or } t^*)$` for the confidence interval?

```r
qnorm(p = 0.975, mean = 0, sd = 1)
```

```
## [1] 1.96
```

```r
qt(p = 0.975, df = 52)
```

```
## [1] 2.007
```

---

P% CI for parameter:

$$
\mbox{statistic} \pm z^* SE
$$

**Question**: What percentile/quantile do I need for a 90% CI?

```r
qnorm(p = 0.95, mean = 0, sd = 1)
```

```
## [1] 1.645
```

```r
qt(p = 0.95, df = 52)
```

```
## [1] 1.675
```

---

### Probability Calculations in R

**Question**: How do I compute probabilities in R?

```r
pnorm(q = 1, mean = 0, sd = 1)
```

```
## [1] 0.8413
```

```r
pt(q = 1, df = 52)
```

```
## [1] 0.839
```

**Doesn't seem quite right**...

---

### Probability Calculations in R

**Question**: How do I compute probabilities in R?

```r
pnorm(q = 1, mean = 0, sd = 1, lower.tail = FALSE)
```

```
## [1] 0.1587
```

```r
pt(q = 1, df = 52, lower.tail = FALSE)
```

```
## [1] 0.161
```

---

### Probability Calculations in R

**To help you remember**:

Want a **P**robability?

&rarr; use `pnorm()`, `pt()`, ...

Want a **Q**uantile (i.e. percentile)?

&rarr; use `qnorm()`, `qt()`, ...

---

### Probability Calculations in R

**Question**: When might I want to do probability calculations in R?

--
    
&rarr; Computed a test statistic that is approximated by a named random variable.  Want to compute the p-value.

&rarr; Compute a confidence interval.

&rarr; To do a **Sample Size Calculation**.

---

### Sample Size Calculations

* Very important part of the data analysis process!

* Happens BEFORE you collect data.

* You determine how large your sample size needs for a desired precision in your CI.
    + Will do sample size calculations in lab this week!
    + (There is also a hypothesis test version that we won't be covering in Math 141.)
    
---

### Sample Size Calculations

**Question**: Why do we need sample size calculations?

**Example**: Let's return to the dolphins for treating depression example.

With a sample size of 30 and 95% confidence, we estimate that the improvement rate for depression is between 14.5 percentage points and 75 percentage points higher if you swim with a dolphin instead of doing yoga.

With a wide of 60.5 percentage points, this 95% CI is a **wide**/very imprecise interval.

**Question**: How could we make it narrower?  How could we decrease the Margin of Error (ME)?

&rarr; Decrease the confidence level!

&rarr; Increase the sample size!

---

### Paired Data: Mean Difference

**Example**: Is the mean number of free throw attempts awarded to the Miami Heat during games different from the mean number attempted by their opponents?

```r
library(tidyverse)
library(Lock5Data)
# Data
data("MiamiHeat")
select(MiamiHeat, Game, Location, Opp, FTA, OppFTA) %>%
  slice(1:6)
```

```
##   Game Location Opp FTA OppFTA
## 1    1     Away BOS  25     25
## 2    2     Away PHI  31     11
## 3    3     Home ORL  27     34
## 4    4     Away NJN  34     23
## 5    5     Home MIN  31     38
## 6    6     Away NOH  24     17
```

* Variables of interest:

<br>

* Parameter of interest:

---

### Paired Data: Mean Difference

```r
select(MiamiHeat, Game, Location, Opp, FTA, OppFTA) %>%
  slice(1:6)
```

What are the cases?

How could we control for case-to-case variability?

---

### Paired Data: Mean Difference

* Paired data: repeated observations on the same case

* Paired data should not be treated as independent observations
    + Why not?
    
* By accounting for the case-to-case variability, any differences we see are more directly related to the explanatory variable.

---

### Paired Data: Mean Difference

```r
# Calculate Difference
MiamiHeat <- MiamiHeat %>%
  mutate(diff_FTA = FTA - OppFTA)

# Visualize
ggplot(data = MiamiHeat, mapping = aes(x = diff_FTA)) +
  geom_histogram()
```

---

```r
# One-sample t-test
t.test(x = MiamiHeat$diff_FTA)
```

```
## 
## 	One Sample t-test
## 
## data:  MiamiHeat$diff_FTA
## t = 3.5, df = 81, p-value = 0.0008
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.618 5.895
## sample estimates:
## mean of x 
##     3.756
```

```r
# Ignoring the pairing
t.test(x = MiamiHeat$FTA, y = MiamiHeat$OppFTA)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  MiamiHeat$FTA and MiamiHeat$OppFTA
## t = 3.3, df = 159, p-value = 0.001
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.483 6.029
## sample estimates:
## mean of x mean of y 
##     27.90     24.15
```

---

### Assumptions

All these methods that rely on the CLT, assume:

* The sample size is large.

* The sample is a random sample.
    + Observations are independent of each other.

---

class: inverse, center, middle

### Let's finish going through more examples!