Upon completing today’s lab activity, students should be able to do the following using R and RStudio:
library(tidyverse)
library(openintro)
library(dplyr)
library(infer)The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 16.4.
Consider the research study described below.
A Kaiser Family Foundation poll for a random sample of US adults in 2019 found that 79% of Democrats, 55% of Independents, and 24% of Republicans supported a generic “National Health Plan.” There were 347 Democrats, 298 Republicans, and 617 Independents surveyed. K. F. Foundation 2019
Claim: A majority of independents support the National Health Plan (NHP).
The Null and Alternative Hypothesis:
Null Hypothesis Neither a majority nor minority of independents support the NHP. 
 \[H_0: p = 0.50\]
Alternative Hypothesis A majority or minority of independents support the NHP. A majority means that more than 50% of independents support NHP. \[H_A: p \ne 0.50\]
Point-Estimate: \(\hat{p} = 0.55\)
Null Value: \(p_0 = 0.50\)
Conditions:
Independence: The sample is randomly sampled because it says “random sample of US adults in 2019”, meaning our observation is just a small fraction of all US adults.
success-failure: Among independents, \(617(0.55) = 340\) support and \(617(1-0.55) = 277\) don’t support. Both are greater than 10.
n <- 617 # number of trials/samples
N <- 1000 # number of bootstrapped simulations
p_hat <- 0.55 # point estimate
p_0 <- 0.50 # null valuePerform Bootstrapping Procedure.
set.seed(42)
ind_sim_dist <- tibble(stat = rbinom(N, n, p_0)/n)
ind_n_sim <- ind_sim_dist %>%
  filter(stat > p_hat) %>%
  nrow()Compute p-value using the Bootstrapped Simulations.
ind_p_val <- 2*(round(ind_n_sim / N, 4))
ind_p_val## [1] 0.01Compute 95% confidence interval using the Bootstrapped Simulations.
set.seed(42)
ind_sim_dist_phat <- tibble(stat = rbinom(N, n, p_hat)/n)ci_level <- 0.95 # confidence levelWay 1: Percentile Method
\[\text{95% CI} = (2.5th \text{ quantile}, 97.5th \text{ quantile}) \]
left_quantile <- (1-ci_level)/2
right_quantile <- ci_level + left_quantile
ci_interval_perc_boot <- quantile(ind_sim_dist_phat$stat, c(left_quantile,right_quantile))
ci_interval_perc_boot##      2.5%     97.5% 
## 0.5121556 0.5867099Using the percentile method, we are 95% confident that the true population proportion is between 0.4619 and 0.5380.
Way 2: Standard Error Method
\[\text{95% CI} = (\hat{p}-z^* SE, \hat{p}+z^* SE)\] where SE is the standard error and the \(z^*\) is the z-score for the standard normal distribution corresponding to the 95% confidence interval.
standard_deviation <- sd(ind_sim_dist_phat$stat)
z <- qnorm(0.975,0,1)
ci_interval_se_boot <- c(p_hat-z*standard_deviation,p_hat+z*standard_deviation)
ci_interval_se_boot## [1] 0.5117514 0.5882486Using the standard error method, we are 95% confident that the true population proportion is between 0.5113 and 0.5888.
prop.test FunctionThe prop.test function is an R built-in function for inference for proportions.
prop.test(n*p_hat, n, p = p_0, conf.level = ci_level, alternative = c("two.sided"))## 
##  1-sample proportions test with continuity correction
## 
## data:  n * p_hat out of n, null probability p_0
## X-squared = 5.9716, df = 1, p-value = 0.01454
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.5097445 0.5896240
## sample estimates:
##    p 
## 0.55The output shows that the p-value is 0.01454 and the 95% CI is (0.5097,0.5896).
infer Package# actual data for demonstration purposes
data_example <- tibble(response = c(rep("yes", 340), rep("no", 277)))boot_dist_yes <- data_example %>%
  # specify the response and the "success"
  specify(response = response, success = "yes") %>%
  # define the null hypothesis and the null value
  hypothesise(null = "point", p = p_0) %>%
  # generate samples for the null distribution
  generate(reps = 1000, type = "draw") %>%
  # calculate the proportions for the null distribution
  calculate(stat = "prop")# compute p-value using a two-sided alterntive hypothesis
p_value <- boot_dist_yes %>% get_pvalue(obs_stat = p_hat, direction = "two-sided")
p_value## # A tibble: 1 × 1
##   p_value
##     <dbl>
## 1   0.004boot_dist_pe <- data_example %>%
  # specify the response and the "success"
  specify(response = response, success = "yes") %>%
  # generate bootstrap samples
  generate(reps = 1000, type = "bootstrap") %>%
  # calculate the proportions for the point-estimate distribution
  calculate(stat = "prop")# compute 95% confidence interval using the percentile method
ci_interval_perc_boot <- boot_dist_pe %>% get_confidence_interval(level = 0.95, type = "percentile")
ci_interval_perc_boot## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.509    0.590# compute 95% confidence interval using the standard error method
ci_interval_se_boot <- boot_dist_pe %>% get_confidence_interval(level = 0.95, type = "se", point_estimate = p_hat)
ci_interval_se_boot## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.510    0.590Is college worth it?
Among a simple random sample of 331 American adults who do not have a four-year college degree and are not currently enrolled in school, 48% said they decided not to go to college because they could not afford school. Pew Research Center 2011