class: center, middle

### Statistical Inference using Probability Models

<span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 10 | Fall 2020] </span>

---

## Announcements/Reminders

* Project Assignment 3.
    + Due Fri Nov 20th
    + Conduct a hypothesis test and construct a confidence interval related to your research questions.
    + "pa3.rmd" is in the RStudio Shared Folder.

* Postponed Lab 8 due date to 11pm on Sunday.  Feel free to resubmit, even if you have already.
---

## Week 10 Topics

* Probability Theory

* Statisticial Inference: Theoretical Distributions

*********************************************

### Goals for Today

* Central Limit Theorem

* Approximating sampling distributions

* Z-score test statistics

* Formula-based CIs

---

### Recap

* Statistics ARE random variables!

* We can often approximate the probability function of a statistic with the probability function of a named random variable (i.e., Normal, t, ...).

* `$\hat{p}$` = sample proportion of correct receiver guesses out of 329 trials

* Null Distribution and the probability function of a N(0.25, 0.024):

How do I know what probability function is a good approximation?

---

### Approximating Sampling Distributions

**Central Limit Theorem:** For random samples and a large sample size `$(n)$`, the sampling distribution of the many sample statistics is approximately normal.

**Example**: Trees in Mount Tabor

---

### Approximating Sampling Distributions

**Central Limit Theorem (CLT):** For random samples and a large sample size `$(n)$`, the sampling distribution of the many sample statistics is approximately normal.

**Example**: Trees in Mount Tabor

But **which** Normal?  (What is the value of `$\mu$` and `$\sigma$`?)

---

### Approximating Sampling Distributions

**Question**: But **which** normal?  (What is the value of `$\mu$` and `$\sigma$`?)

* The sampling distribution of a statistic is always centered around:

* The CLT also provides formula estimates of the standard error.
    + The formula varies based on the statistic.

---

### Approximating the Sampling Distribution of a Sample Proportion

CLT says: For large `$n$` (At least 10 successes and 10 failures),

$$
\hat{p} \sim N \left(p, \sqrt{\frac{p(1-p)}{n}} \right)
$$

**Example**: Trees in Mount Tabor

* Parameter: `$p$` = proportion of Douglas Fir = 0.615

* Statistic: `$\hat{p}$` = proportion of Douglas Fir in a sample of 50 trees

$$
\hat{p} \sim N \left(0.615, \sqrt{\frac{0.615(1-0.615)}{50}} \right)
$$

**NOTE**: Can plug in the true parameter here because we had data on the whole population.

---

### Approximating the Sampling Distribution of a Sample Proportion

**Question**: What do we do when we don't have access to the whole population?

* Have:

$$
\hat{p} \sim N \left(p, \sqrt{\frac{p(1-p)}{n}} \right)
$$

**Answer**: We will have to estimate the SE.

---

### Side-bar #1: Z-score Test Statistics

* All of our test statistics so far have been sample statistics.

* Another commonly used test statistic takes the form of a **z-score**.

$$
\mbox{Z-score} = \frac{X - \mu}{\sigma}
$$

* Z-score measures how many standard deviations the sample statistic is away from its mean.

* Allows us to quickly (but roughly) classify results as unusual or not.
    + `$|$` Z-score `$|$` > 2 &rarr; results are unusual/p-value will be smallish

* Commonly used because if `$X \sim N(\mu, \sigma)$`, then

$$
\mbox{Z-score} = \frac{X - \mu}{\sigma} ~ N(0, 1)
$$
    
---

### Z-score Test Statistic in Action

Let's consider conducting a hypothesis test for a single proportion: `$p$`
--

Need:

* Hypotheses

* Test statistic and its null distribution

* P-value

---

### Z-score Test Statistic in Action

Let's consider conducting a hypothesis test for a single proportion: `$p$`

`$H_o: p = p_o$` where `$p_o$` = null value

`$H_a: p > p_o$`

By the CLT, under `$H_o$`:

$$
\hat{p} \sim N \left(p_o, \sqrt{\frac{p_o(1-p_o)}{n}} \right)
$$
--

* Z-score test statistic:

$$
Z = \frac{\hat{p} - p_o}{\sqrt{\frac{p_o(1-p_o)}{n}}}
$$
--

* Use `$N(0, 1)$` to find the p-value

---

### Z-score Test Statistic in Action

Let's consider conducting a hypothesis test for a single proportion: `$p$`

**Example**: Bern and Honorton's (1994) extrasensory perception (ESP) studies

```r
# Use probability model to approximate null distribution
prop.test(x = 106, n = 329, p = 0.25,
          alternative = "greater")
```

```
## 
## 	1-sample proportions test with continuity correction
## 
## data:  106 out of 329, null probability 0.25
## X-squared = 9, df = 1, p-value = 0.002
## alternative hypothesis: true p is greater than 0.25
## 95 percent confidence interval:
##  0.28 1.00
## sample estimates:
##    p 
## 0.32
```

---

### Side-bar #2: Formula-Based CIs

Suppose statistic `$\sim N(\mu = \mbox{parameter}, \sigma = SE)$`.

#### 95% CI for parameter:

$$
\mbox{statistic} \pm 2 SE
$$

Can generalize this formula!

#### P% CI for parameter:

$$
\mbox{statistic} \pm z^* SE
$$
---

###  Formula-Based CIs in Action

Let's consider constructing a confidence interval for a single proportion: `$p$`

By the CLT,

$$
\hat{p} \sim N \left(p, \sqrt{\frac{p(1-p)}{n}} \right)
$$
--

#### P% CI for parameter:

`\begin{align*}
\mbox{statistic} \pm z^* SE
\end{align*}`

---

###  Formula-Based CIs in Action

Let's consider constructing a confidence interval for a single proportion: `$p$`

**Example**: Bern and Honorton's (1994) extrasensory perception (ESP) studies

```r
# Use probability model to approximate null distribution
prop.test(x = 106, n = 329, p = 0.25,
          alternative = "two.sided",
          conf.level = 0.95)
```

```
## 
## 	1-sample proportions test with continuity correction
## 
## data:  106 out of 329, null probability 0.25
## X-squared = 9, df = 1, p-value = 0.003
## alternative hypothesis: true p is not equal to 0.25
## 95 percent confidence interval:
##  0.27 0.38
## sample estimates:
##    p 
## 0.32
```

---

### Formula-Based CIs

#### P% CI for parameter:

$$
\mbox{statistic} \pm z^* SE
$$

Notes:

* Didn't construct the bootstrap distribution.

* Need to check that `$n$` is large and that the sample is random/representative.

* Interpretation of the CI doesn't change.

* For some parameters, the critical value  comes from a `$t$` distribution.

* Now we have a formula for the Margin of Error.
    + That will provide useful for sample size calculations.
    
---

### Statistical Inference using Probability Models

We went through theory-based inference for `$p$`.

There are similar results for other parameters.  But the distribution changes!

Can find some common test statistics and CIs [on the course website](https://reed-statistics.github.io/math141f20/inference_procedures.html)