12 - Inference for Comparing Paired Means

Alex John Quijano

11/29/2021

Previously on Statistics…

Hypothesis testing and confidence interval for one mean
Hypothesis testing and confidence interval for comparing two means

Inference on Single Mean

Today, we will discuss the following:

Hypothesis testing and confidence intervals for paired means.

Global Warming

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 21.5. Consider the research study described below.

Let’s consider a limited set of climate data, examining temperature differences in 1948 vs 2018. We sampled 197 locations from the National Oceanic and Atmospheric Administration’s (NOAA) historical data, where the data was available for both years of interest. We want to know: were there more days with temperatures exceeding 90F in 2018 or in 1948? NOAA 2018 The difference in number of days exceeding 90F (number of days in 2018 - number of days in 1948) was calculated for each of the 197 locations. The average of these differences was 2.9 days with a standard deviation of 17.2 days.

The climate70 data used in this exercise can be found in the openintro R package.

We are interested in determining whether these data provide strong evidence that there were more days in 2018 that exceeded 90F from NOAA’s weather stations.

The Data Visualized

For each observation in one dataset, there is exactly one specially corresponding observation in the other dataset for the same geographic location. The data are paired.

The Null and Alternative Hypothesis

Null: There is no difference in average number of days exceeding 90F in 1948 and 2018 for NOAA stations. \[H_0: \mu_{diff} = 0\] Alternative: There is a difference. \[H_A: \mu_{diff} \ne 0\]
Locations were randomly sampled, so independence is reasonable. The sample size is at least 30, so we’re just looking for particularly extreme outliers: none are present (the observation off left in the histogram would be considered a clear outlier, but not a particularly extreme one). Therefore, the conditions are satisfied.

Hypothesis Testing

Compute standard error. \[SE = \frac{s_{diff}}{\sqrt{n_{diff}}} = \frac{17.2}{\sqrt{197}} = 1.23\]
Compute the T statistic.

\[T = \frac{\bar{x}_{diff} - \mu_{diff,0}}{SE} = \frac{2.9−0}{1.23}=2.36\] with degrees of freedom \(df=197−1=196.\) This leads to a one-tail area of \(0.0096\) and a p-value of about \(0.019\) for a two-tailed test.
Since the p-value is less than \(0.05\), we reject \(H_0\). The data provide strong evidence that NOAA stations observed more 90F days in 2018 than in 1948.

95% Confidence Interval

The point-estimate. \[\bar{x}_{diff} = 2.9\]
Margin of error. \[ME = z^* SE = z^* \frac{s_{diff}}{\sqrt{n_{diff}}} = 1.97 (1.23)\]
95% CI. \[ \begin{aligned} \bar{x}_{diff} & \pm ME \\ 2.9 & \pm 2.4231 \end{aligned} \] \[(0.4769 5.3231)\]
We are 95% confident that the true average number of days exceeding 90F in 1948 and 2018 is between 0.4769 and 5.3231. The null statistic is outside this interval. Thus, we get a statistically significant result consistent with our hypothesis testing conclusion.

10.10-Minute Activity

Consider the following statement.

Each textbook has two corresponding prices in the data set: one for the UCLA bookstore and one for Amazon. Therefore, each textbook price from the UCLA bookstore has a natural correspondence with a textbook price from Amazon. When two sets of observations have this special correspondence, they are said to be paired.

Are textbooks actually cheaper online? Here we compare the price of textbooks at UCLA’s bookstore and prices at Amazon.com. Seventy-three UCLA courses were randomly sampled in Spring 2010, representing less than 10% of all UCLA courses. Source: AHS

The summary statistics are given here.

\[n_{diff} = 73, \hspace{10px} \bar{x}_{diff} = 12.76, \hspace{10px} s_{diff} = 14.26\]

Are the conditions satisfied? How is this different from our previous topic of comparing two independent means?
What is the null and alternative hypothesis?
Perform a hypothesis test and compute the confidence interval. What is are your conclusions in terms of the problem?

10.10-Minute Activity

TBA

Summary

Today, we discussed the following:

Inference for comparing paired means

Next, we will discuss:

Inference for comparing multiple means