12 - Inference for Two Means
Confidence Intervals

Alex John Quijano

11/19/2021

Previously on Statistics…

Inference on Single Mean

Today, we will discuss the following:

Fuel Efficiency in the City

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 20.6. Consider the research study described below.

Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2021. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? US DOE EPA 2021

We will compute the 95% confidence interval for the true difference in means \(\mu_{automatic} - \mu_{manual}\).

CITY Mean SD n
Automatic 17.4 3.44 25
Manual 22.7 4.58 25

Conditions

Here, we see two outliers in the manual group. However, both groups shows decent distributions with balanced outliers where - in this case - we can “ignore” the outliers and assume normality of the sampling distribution of the means.

The Margin of Error (1/3)

The Margin of Error (2/3)

The Margin of Error (3/3)

95% Confidence Interval

Therefore, we are 95% confidence that the true difference in mean fuel efficiency (miles/gallon) between automatic and manual cars is between 2.9356 and 7.664 in absolute value.

Note that the values are originally negative because how the order of difference terms are computed, meaning a negative difference indicate that there is more efficiency in cars with manual transmission than automatic transmission.

10.10-Minute Activity (1/5)

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 20.6. Consider the research study described below.

Chicken diet: horsebean vs. linseed.

An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into groups, and each group was given a different feed supplement. We consider chicks that were fed horsebean and linseed. Below are some summary statistics from this dataset along with box plots showing the distribution of weights by feed type. McNeil 1977

Feed type Mean SD n
horsebean 160.20 38.63 10
linseed 218.75 52.24 12

  1. Describe the distributions of weights of chickens that were fed horsebean and linseed.
  2. Compute the 90% confidence interval for the difference in means.
  3. What is the conclusion? Comment on what happens if the sample size increases or decreases.

10.10-Minute Activity (2/5)

10.10-Minute Activity (2/5)

    • We can choose the smaller sample to compute the degrees of freedom to be \(df = 9\).

    • Also, since the difference in sample size are not too imbalanced, choosing the \(df = 11\) from the larger sample size, is an “okay” choice. It just means that the power you have is based on the larger sample size but this may bias the CI towards your larger sample size if you have an extremely imbalanced data.

    • Note that choosing the smaller sample size for the the degrees of freedom does

10.10-Minute Activity (3/5)

    • Compute the standard error. \[ \begin{aligned} SE & = \sqrt{\frac{s_{horsebean}^2}{n_{horsebean}} + \frac{s_{linseed}^2}{n_{linseed}}} \\ & = \sqrt{\frac{38.63^2}{10} + \frac{52.24^2}{12}} \\ SE & = 19.4074 \end{aligned} \]

    • Compute the margin of error. For a confidence level of 90%, we have \(t^*_{9} = 1.8331\) \[ \begin{aligned} ME & = t^*_{24} SE \\ & = 1.8331 (19.4074) \\ ME & = 35.5757 \end{aligned} \]

10.10-Minute Activity (4/5)

10.10-Minute Activity (5/5)

    • Since the null value of 0 (there is no difference in mean) is not inside the 90% confidence interval, then we have a conclusion that there is a significant difference between the mean weights of chickens between groups.

    • If the sample sizes increases assuming the mean and standard deviation stays the same, then the 90% confidence interval becomes narrower because the margin of error decreases as the sample size increases.

    • If the sample sizes gets smaller then the margin of error increases and the \(t^*_{df}\) gets larger. Thus, more uncertainty when making a conclusion.

Summary

Today, we discussed the following:

Next, we will discuss: