Previously on Inference…

Hypothesis and confidence intervals for one proportion
Hypothesis and confidence intervals for comparing two proportions
Bootstrapping and theoretical methods of inference for proportions

Inference on Two-Way Tables

Today, we will discuss the following:

The Chi-Square test for independence

Popular Kids

Consider the following problem description.

Students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below. Do these data provide evidence to suggest that goals vary by grade?

	Grade	Popular	Sports
4th	63	31	23
5th	88	55	33
6th	96	55	32

Source: Popular Kids Dataset. This is from a 1992 study and was revisited 30 years later.

The Chi-Squared Test for Independence (1/2)

The null and alternative Hypothesis \[H_0: \text{Grade and goals are independent. Goals do not vary by grade.}\] \[H_A: \text{Grade and goals are dependent. Goals vary by grade}\]

The Chi-Squared Test for Independence (1/2)

The Chi-Squared test statistic \[\chi^2_{k} = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i}\]
- \(O_i\) is the number of observations of type \(i\)
- \(E_i\) is the expected frequency of type \(i\)
- \(n\) is the number of cells in the table
- \(k = (R-1)(C-1)\) is the degrees of freedom where \(R\) is the number of rows and C is the number of columns.

Computing the \(\chi^2\) statistic - Expected Frequency (1/3)

Start with the expected frequency of type \(i\)

	Grade	Popular	Sports	Total
4th	\(\color{blue}{63}\)	\(\color{orange}{31}\)	23	119
5th	88	55	33	176
6th	96	55	\(\color{red}{32}\)	183
Total	247	141	90	478

Note: Color corresponds to the cell and we are rounding to the nearest integer for computing the expected frequencies.

\[\color{blue}{E_{4th,Grade} = \frac{(119)(247)}{478} = 61}\] \[\color{orange}{E_{4th,Popular} = \frac{(119)(141)}{478} = 35}\] \[\vdots\] \[\color{red}{E_{6th,Sports} = \frac{(183)(90)}{478} = 34}\]

Computing the \(\chi^2\) statistic - Expected Frequency (2/3)

Question - What is the expected count for the highlighted cell?

	Grade	Popular	Sports	Total
4th	63	31	23	119
5th	88	\(\color{green}{55}\)	33	176
6th	96	55	32	183
Total	247	141	90	478

\[\color{green}{E_{5th,Popular} = \frac{(176)(141)}{478} = 52}\]

Computing the \(\chi^2\) statistic - Expected Frequency (3/3)

The expected frequency for each \(\color{blue}{[cell]}\).

	Grade	Popular	Sports	Total
4th	63 \(\color{blue}{[61]}\)	31 \(\color{blue}{[35]}\)	23 \(\color{blue}{[23]}\)	119
5th	88 \(\color{blue}{[91]}\)	55 \(\color{blue}{[52]}\)	33 \(\color{blue}{[33]}\)	176
6th	96 \(\color{blue}{[95]}\)	55 \(\color{blue}{[54]}\)	32 \(\color{blue}{[34]}\)	183
Total	247	141	90	478

Computing the \(\chi^2\) statistic

The \(\chi^2\) statistic. \[\chi^2_{k} = \frac{(63-61)^2}{61} + \frac{(31-35)^2}{35} + \cdots + \frac{(32-34)^2}{34} = 1.3121\]
Degrees of freedom. \[k = (3-1)*(3-1) = 2(2) = 4\]

Computing the p-value

\(\chi^2_{k} = 1.3121\) and \(k = 4\)
We can use the pchisq function in R.

Use 1-pchisq(1.3121,4) which yields a p-value of 0.8593.
Note that in a chi-squared analysis, the p-value is the probability of obtaining a chi-square as large or larger than that in the current experiment.

Conclusion

Do these data provide evidence to suggest that goals vary by grade? \[H_0: \text{Grade and goals are independent. Goals do not vary by grade.}\] \[H_A: \text{Grade and goals are dependent. Goals vary by grade}\]
Since the p-value is large, we fail to reject \(H_0\). The data do not provide convincing evidence that grade and goals are dependent. It doesn’t appear that goals vary by grade.

10.10-Minute Activity (1/2)

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 18.4. Consider the research study described below.

Coffee and Depression.

Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician- diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption. Lucas et al. 2011

	Caffeinated coffee consumption
Clinical depression	1 cup / week or fewer	2-6 cups / week	1 cups / day	2-3 cups / day	4 cups / day or more	Total
Yes	670	___	905	564	95	2,607
No	11,545	6,244	16,329	11,726	2,288	48,132
Total	12,215	6,617	17,234	12,290	2,383	50,739

Compute the test statistic. What is the p-value?
What is the conclusion of the hypothesis test?

Timer starts

10:10

10.10-Minute Activity (1/2)

\(\chi^2 =20.93\) and degrees of freedom is \(k =4\)
p-value: 1-pchisq(20.93,4) \(= 0.0003\)
Therefore, p-value is small and we reject \(H_0\). The data provide convincing evidence to suggest that caffeinated coffee consumption and depression in women are associated.

Summary

Today, we discussed the following:

Computing the Chi-Square statistic
Computing p-values using the Chi-squared distribution
Performing inference (hypothesis testing) using the Chi-Squared test for independence

Next, we will discuss:

Inference on single mean

In lab, we will work on:

Performing Chi-square test for independence using R

12 - Inference for Two-Way Tables

Previously on Inference…

Inference on Two-Way Tables

Popular Kids

The Chi-Squared Test for Independence (1/2)

The Chi-Squared Test for Independence (1/2)

Computing the \(\chi^2\) statistic - Expected Frequency (1/3)

Computing the \(\chi^2\) statistic - Expected Frequency (2/3)

Computing the \(\chi^2\) statistic - Expected Frequency (3/3)

Computing the \(\chi^2\) statistic

Computing the p-value

Conclusion

10.10-Minute Activity (1/2)

10.10-Minute Activity (1/2)

Summary