Fuel Efficiency in the City

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 20.6. Consider the research study described below.

Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2021. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? US DOE EPA 2021

We will compute the 95% confidence interval for the true difference in means \(\mu_{automatic} - \mu_{manual}\).

CITY	Mean	SD	n
Automatic	17.4	3.44	25
Manual	22.7	4.58	25

Conditions

Conditions.
- Independence (extended). The data are independent within and between the two groups, e.g., the data come from independent random samples or from a randomized experiment.
- Normality. We need large enough sample size for each group - at least 30. We check the extreme outliers for each group separately.

Here, we see two outliers in the manual group. However, both groups shows decent distributions with balanced outliers where - in this case - we can “ignore” the outliers and assume normality of the sampling distribution of the means.

The Margin of Error (1/3)

The standard error may be computed as \[SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\] The official formula for the degrees of freedom is quite complex and is generally computed using software, so instead you may use the smaller of \(n_1 - 1\) and \(n_2 - 1\) for the degrees of freedom if software isn’t readily available.
Margin of error for \(\bar{x}_1 - \bar{x}_2.\) The margin of error is \(t^*_{df} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) where \(t^*_{df}\) is calculated from a specified percentile on the t-distribution with df degrees of freedom.

The Margin of Error (2/3)

Standard Error \[ \begin{aligned} SE & = \sqrt{\frac{s_{automatic}^2}{n_{automatic}} + \frac{s_{manual}^2}{n_{manual}}} \\ & = \sqrt{\frac{3.44^2}{25} + \frac{4.58^2}{25}} \\ SE & = 1.1456 \end{aligned} \]
Margin of Error \[ \begin{aligned} ME & = t^*_{24} SE \\ & = 2.0639 (1.1456) \\ ME & = 2.3644 \end{aligned} \]

The Margin of Error (3/3)

A note on the degrees of freedom: Our example shows two equal sample sizes in each group. So, the degrees of freedom is \(25 - 1 = 24\). In some cases where the sample sizes are different, normally we can pick the smaller sample size. This doesn’t bias your CI. It just means the power you have is based on the smaller sample.
We have talked about power when we learned about decision errors a few weeks ago but we will talke more on the power concept later.

95% Confidence Interval

The 95% Confidence interval is computed as. \[ \begin{aligned} \bar{x}_{automatic} - \bar{x}_{manual} \pm ME & = 17.4 - 22.7 \pm 2.3644 \\ & = -5.3 \pm 2.3644 \end{aligned} \] \[(-7.6644,-2.9356)\]

Therefore, we are 95% confidence that the true difference in mean fuel efficiency (miles/gallon) between automatic and manual cars is between 2.9356 and 7.664 in absolute value.

Note that the values are originally negative because how the order of difference terms are computed, meaning a negative difference indicate that there is more efficiency in cars with manual transmission than automatic transmission.

10.10-Minute Activity (1/5)

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 20.6. Consider the research study described below.

Chicken diet: horsebean vs. linseed.

An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into groups, and each group was given a different feed supplement. We consider chicks that were fed horsebean and linseed. Below are some summary statistics from this dataset along with box plots showing the distribution of weights by feed type. McNeil 1977

Feed type	Mean	SD	n
horsebean	160.20	38.63	10
linseed	218.75	52.24	12

Describe the distributions of weights of chickens that were fed horsebean and linseed.
Compute the 90% confidence interval for the difference in means.
What is the conclusion? Comment on what happens if the sample size increases or decreases.

10.10-Minute Activity (2/5)

The sample sizes are slightly imbalanced but not by much. Here, the difference in sample size is just two. However, both groups have sample size less than 30. We can relax the second condition by arguing that both groups are observed to have NO extreme outliers.

10.10-Minute Activity (2/5)

- We can choose the smaller sample to compute the degrees of freedom to be \(df = 9\).
- Also, since the difference in sample size are not too imbalanced, choosing the \(df = 11\) from the larger sample size, is an “okay” choice. It just means that the power you have is based on the larger sample size but this may bias the CI towards your larger sample size if you have an extremely imbalanced data.
- Note that choosing the smaller sample size for the the degrees of freedom does

10.10-Minute Activity (3/5)

- Compute the standard error. \[ \begin{aligned} SE & = \sqrt{\frac{s_{horsebean}^2}{n_{horsebean}} + \frac{s_{linseed}^2}{n_{linseed}}} \\ & = \sqrt{\frac{38.63^2}{10} + \frac{52.24^2}{12}} \\ SE & = 19.4074 \end{aligned} \]
- Compute the margin of error. For a confidence level of 90%, we have \(t^*_{9} = 1.8331\) \[ \begin{aligned} ME & = t^*_{24} SE \\ & = 1.8331 (19.4074) \\ ME & = 35.5757 \end{aligned} \]

10.10-Minute Activity (4/5)

The 90% confidence interval is computed as follows. \[ \begin{aligned} \bar{x}_{horsebean} - \bar{x}_{linseed} \pm ME & = 160.20 - 218.75 \pm 35.5757 \\ & = -58.55 \pm 35.5757 \\ \end{aligned} \] \[(-94.1257,-22.9743)\]

Therefore, we are 90% confident that the true difference in means is between 22.9743 and 94.1257 grams in absolute value. Here, the negative sign indicates that linseed has higher mean weight than horsebean.

10.10-Minute Activity (5/5)

- Since the null value of 0 (there is no difference in mean) is not inside the 90% confidence interval, then we have a conclusion that there is a significant difference between the mean weights of chickens between groups.
- If the sample sizes increases assuming the mean and standard deviation stays the same, then the 90% confidence interval becomes narrower because the margin of error decreases as the sample size increases.
- If the sample sizes gets smaller then the margin of error increases and the \(t^*_{df}\) gets larger. Thus, more uncertainty when making a conclusion.

Summary

Today, we discussed the following:

Computing confidence intervals for two independent means using two sample t-intervals

Next, we will discuss:

Hypothesis testing for one or more means
Analysis of Variance (ANOVA)

12 - Inference for Two Means
Confidence Intervals

Previously on Statistics…

Inference on Single Mean

Fuel Efficiency in the City

Conditions

The Margin of Error (1/3)

The Margin of Error (2/3)

The Margin of Error (3/3)

95% Confidence Interval

10.10-Minute Activity (1/5)

10.10-Minute Activity (2/5)

10.10-Minute Activity (2/5)

10.10-Minute Activity (3/5)

10.10-Minute Activity (4/5)

10.10-Minute Activity (5/5)

Summary

12 - Inference for Two Means Confidence Intervals

Previously on Statistics…

Inference on Single Mean

Fuel Efficiency in the City

Conditions

The Margin of Error (1/3)

The Margin of Error (2/3)

The Margin of Error (3/3)

95% Confidence Interval

10.10-Minute Activity (1/5)

10.10-Minute Activity (2/5)

10.10-Minute Activity (2/5)

10.10-Minute Activity (3/5)

10.10-Minute Activity (4/5)

10.10-Minute Activity (5/5)

Summary

12 - Inference for Two Means
Confidence Intervals