class: center, middle ### Statistical Inference using Probability Models <img src="img/DAW.png" width="450px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 10 | Fall 2020] </span> --- ## Announcements/Reminders * Project Assignment 3. + Due Fri Nov 20th + Conduct a hypothesis test and construct a confidence interval related to your research questions. + "pa3.rmd" is in the RStudio Shared Folder. * Postponed Lab 8 due date to 11pm on Sunday. Feel free to resubmit, even if you have already. --- ## Week 10 Topics * Probability Theory * Statisticial Inference: Theoretical Distributions ********************************************* ### Goals for Today * Central Limit Theorem * Approximating sampling distributions * Z-score test statistics * Formula-based CIs --- ### Recap * Statistics ARE random variables! -- * We can often approximate the probability function of a statistic with the probability function of a named random variable (i.e., Normal, t, ...). -- * `\(\hat{p}\)` = sample proportion of correct receiver guesses out of 329 trials * Null Distribution and the probability function of a N(0.25, 0.024): <img src="wk10_fri_files/figure-html/unnamed-chunk-1-1.png" width="360" style="display: block; margin: auto;" /> -- How do I know what probability function is a good approximation? --- ### Approximating Sampling Distributions -- **Central Limit Theorem:** For random samples and a large sample size `\((n)\)`, the sampling distribution of the many sample statistics is approximately normal. -- **Example**: Trees in Mount Tabor <img src="wk10_fri_files/figure-html/unnamed-chunk-2-1.png" width="648" style="display: block; margin: auto;" /> --- ### Approximating Sampling Distributions **Central Limit Theorem (CLT):** For random samples and a large sample size `\((n)\)`, the sampling distribution of the many sample statistics is approximately normal. **Example**: Trees in Mount Tabor <img src="wk10_fri_files/figure-html/unnamed-chunk-3-1.png" width="648" style="display: block; margin: auto;" /> -- But **which** Normal? (What is the value of `\(\mu\)` and `\(\sigma\)`?) --- ### Approximating Sampling Distributions **Question**: But **which** normal? (What is the value of `\(\mu\)` and `\(\sigma\)`?) -- * The sampling distribution of a statistic is always centered around: -- * The CLT also provides formula estimates of the standard error. + The formula varies based on the statistic. --- ### Approximating the Sampling Distribution of a Sample Proportion CLT says: For large `\(n\)` (At least 10 successes and 10 failures), -- $$ \hat{p} \sim N \left(p, \sqrt{\frac{p(1-p)}{n}} \right) $$ -- **Example**: Trees in Mount Tabor -- * Parameter: `\(p\)` = proportion of Douglas Fir = 0.615 -- * Statistic: `\(\hat{p}\)` = proportion of Douglas Fir in a sample of 50 trees -- $$ \hat{p} \sim N \left(0.615, \sqrt{\frac{0.615(1-0.615)}{50}} \right) $$ -- **NOTE**: Can plug in the true parameter here because we had data on the whole population. --- ### Approximating the Sampling Distribution of a Sample Proportion **Question**: What do we do when we don't have access to the whole population? -- * Have: $$ \hat{p} \sim N \left(p, \sqrt{\frac{p(1-p)}{n}} \right) $$ -- **Answer**: We will have to estimate the SE. --- ### Side-bar #1: Z-score Test Statistics * All of our test statistics so far have been sample statistics. -- * Another commonly used test statistic takes the form of a **z-score**. $$ \mbox{Z-score} = \frac{X - \mu}{\sigma} $$ -- * Z-score measures how many standard deviations the sample statistic is away from its mean. -- * Allows us to quickly (but roughly) classify results as unusual or not. + `\(|\)` Z-score `\(|\)` > 2 → results are unusual/p-value will be smallish -- * Commonly used because if `\(X \sim N(\mu, \sigma)\)`, then $$ \mbox{Z-score} = \frac{X - \mu}{\sigma} ~ N(0, 1) $$ --- ### Z-score Test Statistic in Action Let's consider conducting a hypothesis test for a single proportion: `\(p\)` -- Need: * Hypotheses -- * Test statistic and its null distribution -- * P-value --- ### Z-score Test Statistic in Action Let's consider conducting a hypothesis test for a single proportion: `\(p\)` -- `\(H_o: p = p_o\)` where `\(p_o\)` = null value -- `\(H_a: p > p_o\)` -- By the CLT, under `\(H_o\)`: $$ \hat{p} \sim N \left(p_o, \sqrt{\frac{p_o(1-p_o)}{n}} \right) $$ -- * Z-score test statistic: $$ Z = \frac{\hat{p} - p_o}{\sqrt{\frac{p_o(1-p_o)}{n}}} $$ -- * Use `\(N(0, 1)\)` to find the p-value --- ### Z-score Test Statistic in Action Let's consider conducting a hypothesis test for a single proportion: `\(p\)` **Example**: Bern and Honorton's (1994) extrasensory perception (ESP) studies ```r # Use probability model to approximate null distribution prop.test(x = 106, n = 329, p = 0.25, alternative = "greater") ``` ``` ## ## 1-sample proportions test with continuity correction ## ## data: 106 out of 329, null probability 0.25 ## X-squared = 9, df = 1, p-value = 0.002 ## alternative hypothesis: true p is greater than 0.25 ## 95 percent confidence interval: ## 0.28 1.00 ## sample estimates: ## p ## 0.32 ``` --- ### Side-bar #2: Formula-Based CIs Suppose statistic `\(\sim N(\mu = \mbox{parameter}, \sigma = SE)\)`. -- #### 95% CI for parameter: $$ \mbox{statistic} \pm 2 SE $$ -- Can generalize this formula! -- #### P% CI for parameter: $$ \mbox{statistic} \pm z^* SE $$ --- ### Formula-Based CIs in Action Let's consider constructing a confidence interval for a single proportion: `\(p\)` -- By the CLT, $$ \hat{p} \sim N \left(p, \sqrt{\frac{p(1-p)}{n}} \right) $$ -- #### P% CI for parameter: `\begin{align*} \mbox{statistic} \pm z^* SE \end{align*}` --- ### Formula-Based CIs in Action Let's consider constructing a confidence interval for a single proportion: `\(p\)` **Example**: Bern and Honorton's (1994) extrasensory perception (ESP) studies ```r # Use probability model to approximate null distribution prop.test(x = 106, n = 329, p = 0.25, alternative = "two.sided", conf.level = 0.95) ``` ``` ## ## 1-sample proportions test with continuity correction ## ## data: 106 out of 329, null probability 0.25 ## X-squared = 9, df = 1, p-value = 0.003 ## alternative hypothesis: true p is not equal to 0.25 ## 95 percent confidence interval: ## 0.27 0.38 ## sample estimates: ## p ## 0.32 ``` --- ### Formula-Based CIs #### P% CI for parameter: $$ \mbox{statistic} \pm z^* SE $$ Notes: -- * Didn't construct the bootstrap distribution. -- * Need to check that `\(n\)` is large and that the sample is random/representative. -- * Interpretation of the CI doesn't change. -- * For some parameters, the critical value comes from a `\(t\)` distribution. -- * Now we have a formula for the Margin of Error. + That will provide useful for sample size calculations. --- ### Statistical Inference using Probability Models We went through theory-based inference for `\(p\)`. -- There are similar results for other parameters. But the distribution changes! -- Can find some common test statistics and CIs [on the course website](https://reed-statistics.github.io/math141f20/inference_procedures.html)