Alex John Quijano
11/15/2021
Hypothesis and confidence intervals for one proportion
Hypothesis and confidence intervals for comparing two proportions
Bootstrapping and theoretical methods of inference for proportions
Today, we will discuss the following:
Consider the following problem description.
Students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below. Do these data provide evidence to suggest that goals vary by grade?
Grade | Popular | Sports | |
---|---|---|---|
4th | 63 | 31 | 23 |
5th | 88 | 55 | 33 |
6th | 96 | 55 | 32 |
Source: Popular Kids Dataset. This is from a 1992 study and was revisited 30 years later.
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | \(\color{blue}{63}\) | \(\color{orange}{31}\) | 23 | 119 |
5th | 88 | 55 | 33 | 176 |
6th | 96 | 55 | \(\color{red}{32}\) | 183 |
Total | 247 | 141 | 90 | 478 |
Note: Color corresponds to the cell and we are rounding to the nearest integer for computing the expected frequencies.
\[\color{blue}{E_{4th,Grade} = \frac{(119)(247)}{478} = 61}\] \[\color{orange}{E_{4th,Popular} = \frac{(119)(141)}{478} = 35}\] \[\vdots\] \[\color{red}{E_{6th,Sports} = \frac{(183)(90)}{478} = 34}\]
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | 63 | 31 | 23 | 119 |
5th | 88 | \(\color{green}{55}\) | 33 | 176 |
6th | 96 | 55 | 32 | 183 |
Total | 247 | 141 | 90 | 478 |
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | 63 \(\color{blue}{[61]}\) | 31 \(\color{blue}{[35]}\) | 23 \(\color{blue}{[23]}\) | 119 |
5th | 88 \(\color{blue}{[91]}\) | 55 \(\color{blue}{[52]}\) | 33 \(\color{blue}{[33]}\) | 176 |
6th | 96 \(\color{blue}{[95]}\) | 55 \(\color{blue}{[54]}\) | 32 \(\color{blue}{[34]}\) | 183 |
Total | 247 | 141 | 90 | 478 |
The \(\chi^2\) statistic. \[\chi^2_{k} = \frac{(63-61)^2}{61} + \frac{(31-35)^2}{35} + \cdots + \frac{(32-34)^2}{34} = 1.3121\]
Degrees of freedom. \[k = (3-1)*(3-1) = 2(2) = 4\]
\(\chi^2_{k} = 1.3121\) and \(k = 4\)
We can use the pchisq
function in R.
Use 1-pchisq(1.3121,4)
which yields a p-value of 0.8593.
Note that in a chi-squared analysis, the p-value is the probability of obtaining a chi-square as large or larger than that in the current experiment.
Do these data provide evidence to suggest that goals vary by grade? \[H_0: \text{Grade and goals are independent. Goals do not vary by grade.}\] \[H_A: \text{Grade and goals are dependent. Goals vary by grade}\]
Since the p-value is large, we fail to reject \(H_0\). The data do not provide convincing evidence that grade and goals are dependent. It doesn’t appear that goals vary by grade.
The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 18.4. Consider the research study described below.
Coffee and Depression.
Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician- diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption. Lucas et al. 2011
Caffeinated coffee consumption
|
||||||
---|---|---|---|---|---|---|
Clinical depression | 1 cup / week or fewer | 2-6 cups / week | 1 cups / day | 2-3 cups / day | 4 cups / day or more | Total |
Yes | 670 | ___ | 905 | 564 | 95 | 2,607 |
No | 11,545 | 6,244 | 16,329 | 11,726 | 2,288 | 48,132 |
Total | 12,215 | 6,617 | 17,234 | 12,290 | 2,383 | 50,739 |
Compute the test statistic. What is the p-value?
What is the conclusion of the hypothesis test?
10:10
\(\chi^2 =20.93\) and degrees of freedom is \(k =4\)
p-value: 1-pchisq(20.93,4)
\(= 0.0003\)
Therefore, p-value is small and we reject \(H_0\). The data provide convincing evidence to suggest that caffeinated coffee consumption and depression in women are associated.
Today, we discussed the following:
Computing the Chi-Square statistic
Computing p-values using the Chi-squared distribution
Performing inference (hypothesis testing) using the Chi-Squared test for independence
Next, we will discuss:
In lab, we will work on: