class: center, middle ## Estimation <img src="img/DAW.png" width="450px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 8 | Fall 2020] </span> --- ## Announcements/Reminders * No lab due this week. * PA 2 by end of day Friday. * When not working on your project, close the Shared RStudio Project within RStudio. + Some folks reported havoc (deleted work) when they had the project open. --- ## PA 2: Why are we writing a data biography? > "Consider context. The bottom line for numbers is that they cannot speak for themselves." > "Lacking this context for orientation, strangers in the data set run the risk of getting things entirely wrong or actually doing harm by filling in the missing information with their own biases and assumptions." - Catherine D’Ignazio and Lauren F. Klein --- ## Week 8 Topics * Estimation ********************************************* ## Goals for Today * Key features of **sampling distributions** * Estimation with **confidence intervals** * Revisit parameters versus statistics * Approximating a **sampling distribution** with a **bootstrap distribution** --- ## Sampling Distribution of a Statistic <img src="img/samp_dist.png" width="65%" style="display: block; margin: auto;" /> Steps to Construct an (Approximate) Sampling Distribution: 1. Decide on a sample size, `\(n\)`. 2. Randomly select a sample of size `\(n\)` from the population. 3. Compute the sample statistic. 4. Put the sample back in. 5. Repeat Steps 2 - 4 many (1000+) times. --- ## Key Features of a Sampling Distribution What did we learn about sampling distributions? -- → Standard error = standard deviation of the statistic -- → Centered around the true population parameter. -- → As the sample size increases, the **standard error** (SE) of the statistic decreases. -- → As the sample size increases, the shape of the sampling distribution becomes more bell-shaped and symmetric. -- **Question**: How do sampling distributions help us **quantify uncertainty**? -- **Question**: If I am estimating a parameter in a real example, why won't I ever be able to construct the sampling distribution?? --- ## Estimation **Goal**: Estimate the value of a population parameter using data from the sample. -- **Question**: How do I know which population parameter I am interesting in estimating? → **Answer**: Likely depends on the research question and structure of your data! -- **Point Estimate**: The corresponding statistic * Single best guess for the parameter --- ### Potential Parameters and Point Estimates --- ## Confidence Intervals * It is time to move **beyond** just point estimates to interval estimates that quantify our uncertainty. -- **Confidence Interval**: Interval of plausible values for a parameter -- **Form**: $$ \mbox{statistic} \pm \mbox{Margin of Error} $$ -- **Question**: How do we find the ME? -- → **Answer**: If the sampling distribution of the statistic is approximately bell-shaped and symmetric, then a statistic will be within 2 SEs of the parameter for 95% of the samples. -- **Form**: $$ \mbox{statistic} \pm 2\mbox{SE} $$ -- Called a 95% confidence interval. (Will discuss the meaning of the word confidence soon) --- ## Confidence Intervals **95% CI Form**: $$ \mbox{statistic} \pm 2\mbox{SE} $$ Suppose you want to estimate the average hours of sleep of Reedie. What would you do? -- → Take a sample of 40 students. Ask how many hours they slept last night. Compute the average (statistic). -- But you wouldn't be able to calculate the confidence interval. Why not? -- **Problem**: To compute the SE, we need many samples from the population. We have 1 sample. -- **Solution**: Approximate the sampling distribution using **ONLY OUR ONE SAMPLE!** --- ### Bootstrap Distribution How do we approximate the sampling distribution? <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> Steps for Generating a **Bootstrap Distribution of a Sample Statistic**: 1. Take a sample of size `\(n\)` with replacement from the sample. + Called a bootstrap sample. -- 2. Compute the statistic. -- 3. Repeat 1 and 2 many times. --- ### Let's Practice Generating Bootstrap Samples! **Example:** In a recent study, 23 rats showed compassion that surprised scientists. Twenty-three of the 30 rats in the study freed another trapped rat in their cage, even when chocolate served as a distraction and even when the rats would then have to share the chocolate with their freed companion. (Rats, it turns out, love chocolate.) Rats did not open the cage when it was empty or when there was a stuffed animal inside, only when a fellow rat was trapped. We wish to use the sample to estimate the proportion of rats that show empathy in this way. **Parameter**: **Statistic**: ```r library(tidyverse) # Create a dataframe from the sample data rats <- data.frame(empathy = c(rep("Yes", 23), rep("No", 7))) # Draw a bootstrap sample sample_n(rats, size = 1) sample_n(rats, size = 30, replace = TRUE) ``` --- ### Let's Practice Generating Bootstrap Samples! * Generate three bootstrap samples. * For each sample, compute the bootstrap statistic and put it on the [class dotplot](https://jamboard.google.com/d/1C1TCSxYdcNMte3qlKS0tTJ_92wQXjN32VN1RR-SWzPk/edit?usp=sharing). ```r library(tidyverse) # Create a dataframe from the sample data rats <- data.frame(empathy = c(rep("Yes", 23), rep("No", 7))) # Draw a bootstrap sample sample_n(rats, size = 1) sample_n(rats, size = 30, replace = TRUE) ``` --- ### Sampling Distribution Versus Bootstrap Distribution * Data needed: <br> <br> <br> -- * Center: <br> <br> <br> -- * Spread: --- ### (Bootstrapped) Confidence Intervals **95% CI Form**: $$ \mbox{statistic} \pm 2\mbox{SE} $$ -- We approximate `\(\mbox{SE}\)` with `\(\widehat{\mbox{SE}}\)` = the standard deviation of the bootstrapped statistics.