Upon completing today’s lab activity, students should be able to do the following using R and RStudio:
library(tidyverse)
library(openintro)
library(dplyr)
library(infer)
The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 16.4.
Consider the research study described below.
A Kaiser Family Foundation poll for a random sample of US adults in 2019 found that 79% of Democrats, 55% of Independents, and 24% of Republicans supported a generic “National Health Plan.” There were 347 Democrats, 298 Republicans, and 617 Independents surveyed. K. F. Foundation 2019
Claim: A majority of independents support the National Health Plan (NHP).
The Null and Alternative Hypothesis:
Null Hypothesis Neither a majority nor minority of independents support the NHP.
\[H_0: p = 0.50\]
Alternative Hypothesis A majority or minority of independents support the NHP. A majority means that more than 50% of independents support NHP. \[H_A: p \ne 0.50\]
Point-Estimate: \(\hat{p} = 0.55\)
Null Value: \(p_0 = 0.50\)
Conditions:
Independence: The sample is randomly sampled because it says “random sample of US adults in 2019”, meaning our observation is just a small fraction of all US adults.
success-failure: Among independents, \(617(0.55) = 340\) support and \(617(1-0.55) = 277\) don’t support. Both are greater than 10.
<- 617 # number of trials/samples
n <- 1000 # number of bootstrapped simulations
N <- 0.55 # point estimate
p_hat <- 0.50 # null value p_0
Perform Bootstrapping Procedure.
set.seed(42)
<- tibble(stat = rbinom(N, n, p_0)/n)
ind_sim_dist <- ind_sim_dist %>%
ind_n_sim filter(stat > p_hat) %>%
nrow()
Compute p-value using the Bootstrapped Simulations.
<- 2*(round(ind_n_sim / N, 4))
ind_p_val ind_p_val
## [1] 0.01
Compute 95% confidence interval using the Bootstrapped Simulations.
set.seed(42)
<- tibble(stat = rbinom(N, n, p_hat)/n) ind_sim_dist_phat
<- 0.95 # confidence level ci_level
Way 1: Percentile Method
\[\text{95% CI} = (2.5th \text{ quantile}, 97.5th \text{ quantile}) \]
<- (1-ci_level)/2
left_quantile <- ci_level + left_quantile
right_quantile <- quantile(ind_sim_dist_phat$stat, c(left_quantile,right_quantile))
ci_interval_perc_boot ci_interval_perc_boot
## 2.5% 97.5%
## 0.5121556 0.5867099
Using the percentile method, we are 95% confident that the true population proportion is between 0.4619 and 0.5380.
Way 2: Standard Error Method
\[\text{95% CI} = (\hat{p}-z^* SE, \hat{p}+z^* SE)\] where SE is the standard error and the \(z^*\) is the z-score for the standard normal distribution corresponding to the 95% confidence interval.
<- sd(ind_sim_dist_phat$stat)
standard_deviation <- qnorm(0.975,0,1)
z <- c(p_hat-z*standard_deviation,p_hat+z*standard_deviation)
ci_interval_se_boot ci_interval_se_boot
## [1] 0.5117514 0.5882486
Using the standard error method, we are 95% confident that the true population proportion is between 0.5113 and 0.5888.
prop.test
FunctionThe prop.test
function is an R built-in function for inference for proportions.
prop.test(n*p_hat, n, p = p_0, conf.level = ci_level, alternative = c("two.sided"))
##
## 1-sample proportions test with continuity correction
##
## data: n * p_hat out of n, null probability p_0
## X-squared = 5.9716, df = 1, p-value = 0.01454
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.5097445 0.5896240
## sample estimates:
## p
## 0.55
The output shows that the p-value is 0.01454 and the 95% CI is (0.5097,0.5896).
infer
Package# actual data for demonstration purposes
<- tibble(response = c(rep("yes", 340), rep("no", 277))) data_example
<- data_example %>%
boot_dist_yes # specify the response and the "success"
specify(response = response, success = "yes") %>%
# define the null hypothesis and the null value
hypothesise(null = "point", p = p_0) %>%
# generate samples for the null distribution
generate(reps = 1000, type = "draw") %>%
# calculate the proportions for the null distribution
calculate(stat = "prop")
# compute p-value using a two-sided alterntive hypothesis
<- boot_dist_yes %>% get_pvalue(obs_stat = p_hat, direction = "two-sided")
p_value p_value
## # A tibble: 1 × 1
## p_value
## <dbl>
## 1 0.004
<- data_example %>%
boot_dist_pe # specify the response and the "success"
specify(response = response, success = "yes") %>%
# generate bootstrap samples
generate(reps = 1000, type = "bootstrap") %>%
# calculate the proportions for the point-estimate distribution
calculate(stat = "prop")
# compute 95% confidence interval using the percentile method
<- boot_dist_pe %>% get_confidence_interval(level = 0.95, type = "percentile")
ci_interval_perc_boot ci_interval_perc_boot
## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.509 0.590
# compute 95% confidence interval using the standard error method
<- boot_dist_pe %>% get_confidence_interval(level = 0.95, type = "se", point_estimate = p_hat)
ci_interval_se_boot ci_interval_se_boot
## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.510 0.590
Is college worth it?
Among a simple random sample of 331 American adults who do not have a four-year college degree and are not currently enrolled in school, 48% said they decided not to go to college because they could not afford school. Pew Research Center 2011