class: center, middle ### Two Routes for Statistical Inference <img src="img/DAW.png" width="450px"/> <span style="color: #91204D;"> .large[Kelly McConville | Math 141 | Week 11 | Fall 2020] </span> --- ## Announcements/Reminders * Still time to sign up for the Data Viz Contest! --- ## Week 11 Topics * Practice with Statistical Inference via Probability Models * Chi-Squared Test ********************************************* ### Goals for Today * Two Routes to Statistical Inference * See examples of statistical inference via probability models + Will explore some of the common test statistics and CIs [on the course website](https://reed-statistics.github.io/math141f20/inference_procedures.html) --- ### Recap: Hypothesis Testing <br> <br> <img src="img/hyp_testing_diagram.png" width="800px"/> --- ### Recap: Confidence Intervals <img src="img/ci_diagram.png" width="700px"/> --- ### Examples * Saw how to use `prop.test()` to do inference on a single proportion last class. + Also called a 1 sample z-test -- * Let's explore how to do inference for a single mean. --- ### Inference for a Single Mean **Example:** *Are lakes in Florida more acidic or alkaline?* The pH of a liquid is the measure of its acidity or alkalinity where pure water has a pH of 7, a pH greater than 7 is alkaline and a pH less than 7 is acidic. The following dataset contains observations on 53 lakes in Florida. Use these data to answer our question. ```r library(tidyverse) FloridaLakes <- read_csv("/home/courses/math141f18/Data/FloridaLakes.csv") ``` * **Cases**: * **Variable of interest**: * **Parameter of interest:** --- ### Inference for a Single Mean <br> <br> <img src="img/hyp_testing_diagram.png" width="800px"/> --- ### Inference for a Single Mean ```r library(infer) # Generate null distribution null_dist <- FloridaLakes %>% specify(response = pH) %>% hypothesize(null = "point", mu = 7) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "t") #Compute obs stat t_obs <- FloridaLakes %>% specify(response = pH) %>% calculate(stat = "t", mu = 7) t_obs ``` ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 -2.31 ``` --- ```r # Graph the null distribution null_dist %>% visualize(bins = 30) + geom_vline(xintercept = t_obs$stat, color = "deeppink", size = 2) + geom_vline(xintercept = abs(t_obs$stat), color = "deeppink", size = 2) ``` <img src="wk11_mon_files/figure-html/unnamed-chunk-3-1.png" width="360" style="display: block; margin: auto;" /> --- ### Inference for a Single Mean What probability function is a good approximation to the null distribution? ```r # Graph the null distribution null_dist %>% visualize(bins = 30, method = "both") + geom_vline(xintercept = t_obs$stat, color = "deeppink", size = 2) + geom_vline(xintercept = abs(t_obs$stat), color = "deeppink", size = 2) ``` <img src="wk11_mon_files/figure-html/unnamed-chunk-4-1.png" width="360" style="display: block; margin: auto;" /> --- #### Using the generated null distribution: ```r # Compute p-value pvalue <- null_dist %>% get_p_value(obs_stat = t_obs, direction = "both") pvalue ``` ``` ## # A tibble: 1 x 1 ## p_value ## <dbl> ## 1 0.024 ``` -- #### Using an approximate probability function: ```r #Built-in Function t.test(FloridaLakes$pH, mu = 7, conf.level = .90, alternative = "two.sided") ``` ``` ## ## One Sample t-test ## ## data: FloridaLakes$pH ## t = -2.3, df = 52, p-value = 0.02 ## alternative hypothesis: true mean is not equal to 7 ## 90 percent confidence interval: ## 6.294 6.887 ## sample estimates: ## mean of x ## 6.591 ``` --- class: inverse, center, middle ### Let's go through some more examples