class: center, middle

## Approximating Sampling Distributions

<span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 10 | Fall 2020] </span>

---

## Announcements/Reminders

---

## Week 9 Topics

* Probability Theory

* Statisticial Inference: Theoretical Distributions

*********************************************

### Goals for Today

* Named random variables

* Central Limit Theorem

* Approximating sampling distributions

---

### Probability Recap

* Let's talk through some of the key ideas in the worksheet.

#### Conditional Probabilities:

* Short Hand: P(A given B) = P(A | B)

* P(test - | have COVID ) = 0.13 versus P(have COVID | test -) = 0.0014

* P(test + | don't have COVID) = 0.05 versus P(don't have COVID | test +) = 0.85

---

### Random Variables: Discrete

* Takes on discrete values

* Probability function:

$$
p(x) = P(X = x)
$$
where `$\sum p(x) = 1$`.

* Center: Mean/Expected value:

$$
\mu = \sum x p(x)
$$

* Spread: Standard deviation:

$$
\sigma = \sqrt{ \sum (x - \mu)^2 p(x)}
$$

---

### Random Variables: Continuous

* Can take on any value in a interval

* Probability function: 
    + `$P(X = x) = 0$` so

$$
p(x) \color{orange}{\approx} P(X = x)
$$

but if `$p(4) > p(2)$` that still means that X is more likely to take on values around 4 than values around 2.

---

### Random Variables: Continuous

Change `$\sum$` to `$\color{orange}{\int}$`:

* `$\color{orange}{\int} p(x) dx = 1$`.

* Center: Mean/Expected value:

$$
\mu = \color{orange}{\int} x p(x) dx
$$

* Spread: Standard deviation:

$$
\sigma = \sqrt{ \color{orange}{\int} (x - \mu)^2 p(x) dx}
$$

---

### Specific Named Random Variables

* There are an **infinite** number of random variables out there.

* But there are a few particular ones that we will find useful.
    + Because these ones are used often, they have been given names.

* Will identify these names rvs using the following format:

$$
X \sim \mbox{Name(values of key parameters)}
$$

---

### Specific Named Random Variables

(1) `$X \sim$` Bernoulli `$(p)$`

`\begin{align*}
X=   \left\{
\begin{array}{ll}
      1 & \mbox{success} \\
      0 & \mbox{failure} \\
\end{array} 
\right.  
\end{align*}`

* Important parameter: `$p$` = probability of success = `$P(X = 1)$`

* Probability Function:

| x    | 0   | 1 |
|------|-----|---|
| p(x) | 1-p | p |

* Mean: `$1*p + 0*(1 - p) = p$`

* Standard deviation: `$(1 - p)^2*p + (0 - p)^2*(1 - p) = p$`

---

### Specific Named Random Variables

(1) `$X \sim$` Bernoulli `$(p)$`

`\begin{align*}
X=   \left\{
\begin{array}{ll}
      1 & \mbox{success} \\
      0 & \mbox{failure} \\
\end{array} 
\right.  
\end{align*}`

* Important parameter: `$p$` = probability of success = `$P(X = 1)$`

* Probability Function:

---

### Specific Named Random Variables

(2) `$X \sim$` Normal `$(\mu, \sigma)$`

* Probability Function:

$$
p(x) = \frac{1}{\sqrt{2\pi \sigma^2}}\exp{\left(-\frac{(x - \mu)^2}{2\sigma^2} \right)}
$$
where `$-\infty < x < \infty$`

* Mean: `$\mu$`

* Standard deviation: `$\sigma$`

---

### Specific Named Random Variables

(2) `$X \sim$` Normal `$(\mu, \sigma)$`

* Probability Function:

Notes:

(a) Area under the curve = 1.

(b) Height `$\approx$` how likely values are to occur

**Normal will be a good approximation for MANY distributions.** But sometimes its tails just aren't fat enough.
---

### Specific Named Random Variables

(2) `$X \sim$` t(df)

* Probability Function:

$$
p(x) = \frac{\Gamma(\mbox{df} + 1)}{\sqrt{\mbox{df}\pi} \Gamma(2^{-1}\mbox{df})}\left(1 + \frac{x^2}{\mbox{df}} \right)^{-\frac{\mbox{df} + 1}{2}}
$$
where `$-\infty < x < \infty$`

* Mean: 0

* Standard deviation: `$\sqrt{\mbox{df}/(\mbox{df} - 2)}$`

---

### Specific Named Random Variables

(2) `$X \sim$` t(df)

* Probability Function:

---

### Sample Statistics as Random Variables

Here are some of the sample statistics we've seen lately:

* `$\hat{p}$` = sample proportion of correct receiver guesses out of 329 trials

* `$r$` = correlation between audience and critic ratings for a sample of movies

* `$\hat{p}_D - \hat{p}_Y$` = difference in improvement proportions between those who swam with dolphins and those who did yoga

Why are these all random variables?

But none of these are Bernoulli RVs.

Nor are they Normal RVs.

Nor are they t RVs.

> "All models are wrong but some are useful."  -- George Box

---

### Approximating These Distributions

* `$\hat{p}$` = sample proportion of correct receiver guesses out of 329 trials

* We generated its Null Distribution:

---

### Approximating These Distributions

* `$\hat{p}$` = sample proportion of correct receiver guesses out of 329 trials

* We generated its Null Distribution:

* Its Null Distribution is well approximated by the probability function of a N(0.25, 0.024).

---

### Approximating These Distributions

* `$r$` = correlation between audience and critic ratings for a sample of movies

* In your Lab 8 this week, you are generating its Null Distribution:

---

### Approximating These Distributions

* `$r$` = correlation between audience and critic ratings for a sample of movies

* In your Lab 8 this week, you are generating its Null Distribution:

* Its Null Distribution is well approximated by the probability function of a N(0, 0.023).

---

### Approximating These Distributions

* `$\hat{p}_D - \hat{p}_Y$` = difference in improvement proportions between those who swam with dolphins and those who did yoga

* In your Lab 8 this week, you are generating its Null Distribution:

---

### Approximating These Distributions

* `$\hat{p}_D - \hat{p}_Y$` = difference in improvement proportions between those who swam with dolphins and those who did yoga

* In your Lab 8 this week, you are generating its Null Distribution:

* Its Null Distribution is kinda somewhat well-ish approximated by the probability function of a N(0, 0.16).

---

### Approximating These Distributions

* How did I know what was a good appromixation distribution?

* How does that help us??