Alex John Quijano
10/06/2021
In the previous lectures, we learned about the following:
Using bootstrapping to estimate a confidence interval of proportions.
The connection of confidence intervals and hypothesis testing.
In this lecture, we will learn about:
Using bootstrapping to estimate a confidence interval of population mean.
More conceptual ideas of the confidence intervals.
A light introduction to the normal distribution.
Housing prices in Ames, Iowa
library(openintro)
prices <- ames %>% select(price)
x_bar <- mean(prices$price)
s <- sd(prices$price)
This is the data with sample size \(n = 2930\).
Sample Statistics
\[\bar{x} = 180796.1 \longrightarrow \text{mean}\] \[s = 79886.69 \longrightarrow \text{standard deviation}\]
The mean is a measure of center.
The standard deviation is a measure of spread.
Let \(\mu\) be the unknown population mean, which is the true mean of house prices.
Objective: Estimate \(\mu\) using a 95% confidence interval.
Method: Use the bootstrapping method to calculate the 95% condifence interval
The two methods to compute the confidence interval is using the percentile method or using the standard error method.
We can use the infer
package in R to automate this procedure.
set.seed(5)
ci_95 <- prices %>%
specify(response = price) %>%
generate(reps = 10000, type = "bootstrap") %>%
calculate(stat = "mean") %>%
get_ci(level = 0.95, type="percentile")
print(ci_95)
#> # A tibble: 1 × 2
#> lower_ci upper_ci
#> <dbl> <dbl>
#> 1 177953. 183752.
The 95% CI for the housing prices is \((177953,183752)\). We are 95% confident that the true mean house sale price is in between $177953 and $183752. Here, we are using the percentile method.
If you take large random samples from the same population repeatedly and calculate 95 percent confidence intervals for the population mean, approximately 95 percent of the intervals should contain the true population mean of house prices.
\[p \longrightarrow \text{population proportion}\]
\[\hat{p} \longrightarrow \text{sample proportion}\]
The theoretical distribution of proportions can be modeled using a binomial probability mass function (pmf).
\[\mu \longrightarrow \text{population mean}\]
\[\bar{x} \longrightarrow \text{sample mean}\]
The theoretical distribution of means can be modeled using a normal probability density function (pdf).
Note that the binomial pmf can be approximated using a normal pdf.
In this lecture we talked about:
Confidence intervals with bootstrapping of means.
Steps of bootstrapping
Population Proportions and Population means.
In the next lecture, we will talk about:
Today, we are going to watch one of these videos and discuss among your group members about the video and relate the video to the things we have learned this week and since the beginning of this course.
Regression to the Mean
Galton Board and the Regression to the Mean
Note: Francis Galton was the inventor of the Galton Board (Quincunx) to demonstrate regression toward the mean. Although some of the Galton’s statistical ideas are okay, some of Francis Galton’s ideas was problematic and you can ask me why outside of class.