class: center, middle ## Approximating Sampling Distributions <img src="img/DAW.png" width="450px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 10 | Fall 2020] </span> --- ## Announcements/Reminders --- ## Week 9 Topics * Probability Theory * Statisticial Inference: Theoretical Distributions ********************************************* ### Goals for Today * Named random variables * Central Limit Theorem * Approximating sampling distributions --- ### Probability Recap * Let's talk through some of the key ideas in the worksheet. #### Conditional Probabilities: * Short Hand: P(A given B) = P(A | B) * P(test - | have COVID ) = 0.13 versus P(have COVID | test -) = 0.0014 * P(test + | don't have COVID) = 0.05 versus P(don't have COVID | test +) = 0.85 --- ### Random Variables: Discrete * Takes on discrete values * Probability function: $$ p(x) = P(X = x) $$ where `\(\sum p(x) = 1\)`. -- * Center: Mean/Expected value: $$ \mu = \sum x p(x) $$ -- * Spread: Standard deviation: $$ \sigma = \sqrt{ \sum (x - \mu)^2 p(x)} $$ --- ### Random Variables: Continuous * Can take on any value in a interval -- * Probability function: + `\(P(X = x) = 0\)` so -- $$ p(x) \color{orange}{\approx} P(X = x) $$ but if `\(p(4) > p(2)\)` that still means that X is more likely to take on values around 4 than values around 2. --- ### Random Variables: Continuous Change `\(\sum\)` to `\(\color{orange}{\int}\)`: -- * `\(\color{orange}{\int} p(x) dx = 1\)`. -- * Center: Mean/Expected value: $$ \mu = \color{orange}{\int} x p(x) dx $$ -- * Spread: Standard deviation: $$ \sigma = \sqrt{ \color{orange}{\int} (x - \mu)^2 p(x) dx} $$ --- ### Specific Named Random Variables * There are an **infinite** number of random variables out there. -- * But there are a few particular ones that we will find useful. + Because these ones are used often, they have been given names. -- * Will identify these names rvs using the following format: $$ X \sim \mbox{Name(values of key parameters)} $$ --- ### Specific Named Random Variables (1) `\(X \sim\)` Bernoulli `\((p)\)` `\begin{align*} X= \left\{ \begin{array}{ll} 1 & \mbox{success} \\ 0 & \mbox{failure} \\ \end{array} \right. \end{align*}` <!-- \begin{aligned} --> <!-- X = \begin{array}{cc} --> <!-- \huge\{ & --> <!-- \begin{array}{cc} --> <!-- 1 & \mbox{success} \\ --> <!-- 0 & \mbox{failure} --> <!-- \end{array} --> <!-- \end{array} --> <!-- \end{aligned} --> -- * Important parameter: `\(p\)` = probability of success = `\(P(X = 1)\)` -- * Probability Function: | x | 0 | 1 | |------|-----|---| | p(x) | 1-p | p | -- * Mean: `\(1*p + 0*(1 - p) = p\)` -- * Standard deviation: `\((1 - p)^2*p + (0 - p)^2*(1 - p) = p\)` --- ### Specific Named Random Variables (1) `\(X \sim\)` Bernoulli `\((p)\)` `\begin{align*} X= \left\{ \begin{array}{ll} 1 & \mbox{success} \\ 0 & \mbox{failure} \\ \end{array} \right. \end{align*}` * Important parameter: `\(p\)` = probability of success = `\(P(X = 1)\)` * Probability Function: <img src="wk10_wed_files/figure-html/unnamed-chunk-1-1.png" width="360" style="display: block; margin: auto;" /> --- ### Specific Named Random Variables (2) `\(X \sim\)` Normal `\((\mu, \sigma)\)` * Probability Function: $$ p(x) = \frac{1}{\sqrt{2\pi \sigma^2}}\exp{\left(-\frac{(x - \mu)^2}{2\sigma^2} \right)} $$ where `\(-\infty < x < \infty\)` -- * Mean: `\(\mu\)` -- * Standard deviation: `\(\sigma\)` --- ### Specific Named Random Variables (2) `\(X \sim\)` Normal `\((\mu, \sigma)\)` * Probability Function: <img src="wk10_wed_files/figure-html/unnamed-chunk-2-1.png" width="360" style="display: block; margin: auto;" /> -- Notes: -- (a) Area under the curve = 1. -- (b) Height `\(\approx\)` how likely values are to occur -- (c) Super special Normal RV: `\(Z \sim\)` Normal `\((\mu = 0, \sigma = 1)\)`. -- **Normal will be a good approximation for MANY distributions.** But sometimes its tails just aren't fat enough. --- ### Specific Named Random Variables (2) `\(X \sim\)` t(df) * Probability Function: $$ p(x) = \frac{\Gamma(\mbox{df} + 1)}{\sqrt{\mbox{df}\pi} \Gamma(2^{-1}\mbox{df})}\left(1 + \frac{x^2}{\mbox{df}} \right)^{-\frac{\mbox{df} + 1}{2}} $$ where `\(-\infty < x < \infty\)` -- * Mean: 0 -- * Standard deviation: `\(\sqrt{\mbox{df}/(\mbox{df} - 2)}\)` --- ### Specific Named Random Variables (2) `\(X \sim\)` t(df) * Probability Function: <img src="wk10_wed_files/figure-html/unnamed-chunk-3-1.png" width="360" style="display: block; margin: auto;" /> --- ### Sample Statistics as Random Variables Here are some of the sample statistics we've seen lately: * `\(\hat{p}\)` = sample proportion of correct receiver guesses out of 329 trials * `\(r\)` = correlation between audience and critic ratings for a sample of movies * `\(\hat{p}_D - \hat{p}_Y\)` = difference in improvement proportions between those who swam with dolphins and those who did yoga -- Why are these all random variables? -- But none of these are Bernoulli RVs. -- Nor are they Normal RVs. -- Nor are they t RVs. -- > "All models are wrong but some are useful." -- George Box --- ### Approximating These Distributions * `\(\hat{p}\)` = sample proportion of correct receiver guesses out of 329 trials * We generated its Null Distribution: <img src="wk10_wed_files/figure-html/unnamed-chunk-4-1.png" width="360" style="display: block; margin: auto;" /> --- ### Approximating These Distributions * `\(\hat{p}\)` = sample proportion of correct receiver guesses out of 329 trials * We generated its Null Distribution: <img src="wk10_wed_files/figure-html/unnamed-chunk-5-1.png" width="360" style="display: block; margin: auto;" /> * Its Null Distribution is well approximated by the probability function of a N(0.25, 0.024). --- ### Approximating These Distributions * `\(r\)` = correlation between audience and critic ratings for a sample of movies * In your Lab 8 this week, you are generating its Null Distribution: <img src="wk10_wed_files/figure-html/unnamed-chunk-6-1.png" width="360" style="display: block; margin: auto;" /> --- ### Approximating These Distributions * `\(r\)` = correlation between audience and critic ratings for a sample of movies * In your Lab 8 this week, you are generating its Null Distribution: <img src="wk10_wed_files/figure-html/unnamed-chunk-7-1.png" width="360" style="display: block; margin: auto;" /> * Its Null Distribution is well approximated by the probability function of a N(0, 0.023). --- ### Approximating These Distributions * `\(\hat{p}_D - \hat{p}_Y\)` = difference in improvement proportions between those who swam with dolphins and those who did yoga * In your Lab 8 this week, you are generating its Null Distribution: <img src="wk10_wed_files/figure-html/unnamed-chunk-8-1.png" width="360" style="display: block; margin: auto;" /> --- ### Approximating These Distributions * `\(\hat{p}_D - \hat{p}_Y\)` = difference in improvement proportions between those who swam with dolphins and those who did yoga * In your Lab 8 this week, you are generating its Null Distribution: <img src="wk10_wed_files/figure-html/unnamed-chunk-9-1.png" width="360" style="display: block; margin: auto;" /> * Its Null Distribution is kinda somewhat well-ish approximated by the probability function of a N(0, 0.16). --- ### Approximating These Distributions * How did I know what was a good appromixation distribution? -- * How does that help us??