6 - Introduction to Confidence Intervals

Alex John Quijano

10/04/2021

Previously…

In the previous lectures, we learned about the following:

Central Limit Theorem

Image Source: [Bootstrapping Statistics by Trist'n Joseph.](https://towardsdatascience.com/bootstrapping-statistics-what-it-is-and-why-its-used-e2fa29577307){target=_blank}

Image Source: Bootstrapping Statistics by Trist’n Joseph.

The Central Limit Theorem (CLT) states that regardless of the underlying distribution, the sampling distribution of a statistic (e.g. mean or proportion) of any independent, random variable will be normal or near normal.

Not all sampling distribution will have a normal distribution. Many summary statistics and variables are nearly normal, but none are exactly normal. Thus the normal distribution, while not perfect for any single problem, is very useful for a variety of problems.

Bootstrapping

Image Source: [Bootstrapping Statistics by Trist'n Joseph.](https://towardsdatascience.com/bootstrapping-statistics-what-it-is-and-why-its-used-e2fa29577307){target=_blank}

Image Source: Bootstrapping Statistics by Trist’n Joseph.

Bootstrapping is a method of resampling to estimate the sampling distribution of a statistic (e.g. mean, proportion). Bootstrap sampling is often called sampling with replacement.

Bootstrapping allows us to simulate the sampling distribution of a statistic without the assumption of normality.

Introduction to Confidence Intervals (CIs)

In this lecture, we will learn about:

Case Study - Medical Consultant

One consultant tried to attract patients by noting the average complication rate for liver donor surgeries in the US is about 10%, but her clients have had only 3 complications in the 62 liver donor surgeries she has facilitated.

She claims this is strong evidence that her work meaningfully contributes to reducing complications.

Medical Consultant - Observed and Null Statistic

Medical Consultant - The Problem

Medical Consultant - 95% CI using the Percentile Method

The original medical consultant data is bootstrapped 10,000 times. Each simulation creates a sample from the original data where the proportion of a complication is 3/62. The bootstrap 2.5 percentile proportion is 0 and the 97.5 percentile is 0.113. The result is: we are confident that, in the population, the true probability of a complication is between 0% and 11.3%.

The original medical consultant data is bootstrapped 10,000 times. Each simulation creates a sample from the original data where the proportion of a complication is 3/62. The bootstrap 2.5 percentile proportion is 0 and the 97.5 percentile is 0.113. The result is: we are confident that, in the population, the true probability of a complication is between 0% and 11.3%.

Medical Consultant - Observed and Null Statistic

Medical Consultant - Connection to Hypothesis Testing (1/3)

In hypothesis testing, we always assume that the null hypothesis is true.

Medical Consultant - Connection to Hypothesis Testing (2/3)

The null distribution, created from 10,000 simulated samples. The left tail, representing the p-value for the hypothesis test, contains 0.117 (11.7%) of the simulations.

The null distribution, created from 10,000 simulated samples. The left tail, representing the p-value for the hypothesis test, contains 0.117 (11.7%) of the simulations.

Medical Consultant - Connection to Hypothesis Testing (3/3)

Medical Consultant - CIs and Hypothesis testing

Confidence Intervals (1/2)

Confidence Intervals (2/2)

Summary

In this lecture we talked about:

In the next lectures, we will talk about:

Today’s Activity

Within your group, discuss the answers for the following problem.

Twitter users and news. A poll conducted in 2013 found that 52% of all US adult Twitter users get at least some news on Twitter. However, this value was based on a sample, so it may not be a perfect estimate for the population parameter of interest on its own. The study was based on a sample of 736 adults. Below is a distribution of 1000 bootstrapped sample proportions from the Pew dataset. OpenIntro: IMS Section 12.5

Using the distribution of 1000 bootstrapped proportions, approximate a 98% confidence interval for the true proportion of US adult Twitter users (in 2013) who get at least some of their news from Twitter. Interpret the interval in the context of the problem.