6 - Confidence Intervals Continued

Alex John Quijano

10/06/2021

Previously…

In the previous lectures, we learned about the following:

Confidence Intervals Continued (CIs)

In this lecture, we will learn about:

Case Study - Housing Prices

Housing prices in Ames, Iowa

library(openintro)
prices <- ames %>% select(price)
x_bar <- mean(prices$price)
s <- sd(prices$price)

This is the data with sample size \(n = 2930\).

Sample Statistics

\[\bar{x} = 180796.1 \longrightarrow \text{mean}\] \[s = 79886.69 \longrightarrow \text{standard deviation}\]

The mean is a measure of center.

The standard deviation is a measure of spread.

Housing Prices - Population Mean


Housing Prices - Bootstrapping Steps


The two methods to compute the confidence interval is using the percentile method or using the standard error method.

Housing Prices - Bootstrapping

We can use the infer package in R to automate this procedure.

set.seed(5)
ci_95 <- prices %>%
  specify(response = price) %>%
  generate(reps = 10000, type = "bootstrap") %>%
  calculate(stat = "mean") %>%
  get_ci(level = 0.95, type="percentile")

print(ci_95)
#> # A tibble: 1 × 2
#>   lower_ci upper_ci
#>      <dbl>    <dbl>
#> 1  177953.  183752.

The 95% CI for the housing prices is \((177953,183752)\). We are 95% confident that the true mean house sale price is in between $177953 and $183752. Here, we are using the percentile method.

Housing Prices - 95% Confidence Interval

Housing Prices - Theoretical Distribution

Housing Prices - 95% Confidence Interval Meaning

If you take large random samples from the same population repeatedly and calculate 95 percent confidence intervals for the population mean, approximately 95 percent of the intervals should contain the true population mean of house prices.


You will demonstrate the general meaning and interpretation of the confidence interval by doing an R simulation as part of your LB3 assignment.

Proportion vs Mean

Proportion

\[p \longrightarrow \text{population proportion}\]

\[\hat{p} \longrightarrow \text{sample proportion}\]

The theoretical distribution of proportions can be modeled using a binomial probability mass function (pmf).

Mean

\[\mu \longrightarrow \text{population mean}\]

\[\bar{x} \longrightarrow \text{sample mean}\]

The theoretical distribution of means can be modeled using a normal probability density function (pdf).

Note that the binomial pmf can be approximated using a normal pdf.

Summary

In this lecture we talked about:

In the next lecture, we will talk about:

Today’s Activity

Today, we are going to watch one of these videos and discuss among your group members about the video and relate the video to the things we have learned this week and since the beginning of this course.

Regression to the Mean

Galton Board and the Regression to the Mean

The Galton Board

Note: Francis Galton was the inventor of the Galton Board (Quincunx) to demonstrate regression toward the mean. Although some of the Galton’s statistical ideas are okay, some of Francis Galton’s ideas was problematic and you can ask me why outside of class.