Alex John Quijano
11/19/2021
Today, we will discuss the following:
The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 20.6. Consider the research study described below.
Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2021. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? US DOE EPA 2021
We will compute the 95% confidence interval for the true difference in means \(\mu_{automatic} - \mu_{manual}\).
CITY | Mean | SD | n |
---|---|---|---|
Automatic | 17.4 | 3.44 | 25 |
Manual | 22.7 | 4.58 | 25 |
Here, we see two outliers in the manual group. However, both groups shows decent distributions with balanced outliers where - in this case - we can “ignore” the outliers and assume normality of the sampling distribution of the means.
The standard error may be computed as \[SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\] The official formula for the degrees of freedom is quite complex and is generally computed using software, so instead you may use the smaller of \(n_1 - 1\) and \(n_2 - 1\) for the degrees of freedom if software isn’t readily available.
Margin of error for \(\bar{x}_1 - \bar{x}_2.\) The margin of error is \(t^*_{df} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) where \(t^*_{df}\) is calculated from a specified percentile on the t-distribution with df degrees of freedom.
Standard Error \[ \begin{aligned} SE & = \sqrt{\frac{s_{automatic}^2}{n_{automatic}} + \frac{s_{manual}^2}{n_{manual}}} \\ & = \sqrt{\frac{3.44^2}{25} + \frac{4.58^2}{25}} \\ SE & = 1.1456 \end{aligned} \]
Margin of Error \[ \begin{aligned} ME & = t^*_{24} SE \\ & = 2.0639 (1.1456) \\ ME & = 2.3644 \end{aligned} \]
A note on the degrees of freedom: Our example shows two equal sample sizes in each group. So, the degrees of freedom is \(25 - 1 = 24\). In some cases where the sample sizes are different, normally we can pick the smaller sample size. This doesn’t bias your CI. It just means the power you have is based on the smaller sample.
We have talked about power when we learned about decision errors a few weeks ago but we will talke more on the power concept later.
Therefore, we are 95% confidence that the true difference in mean fuel efficiency (miles/gallon) between automatic and manual cars is between 2.9356 and 7.664 in absolute value.
Note that the values are originally negative because how the order of difference terms are computed, meaning a negative difference indicate that there is more efficiency in cars with manual transmission than automatic transmission.
The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 20.6. Consider the research study described below.
Chicken diet: horsebean vs. linseed.
An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into groups, and each group was given a different feed supplement. We consider chicks that were fed horsebean and linseed. Below are some summary statistics from this dataset along with box plots showing the distribution of weights by feed type. McNeil 1977
Feed type | Mean | SD | n |
---|---|---|---|
horsebean | 160.20 | 38.63 | 10 |
linseed | 218.75 | 52.24 | 12 |
We can choose the smaller sample to compute the degrees of freedom to be \(df = 9\).
Also, since the difference in sample size are not too imbalanced, choosing the \(df = 11\) from the larger sample size, is an “okay” choice. It just means that the power you have is based on the larger sample size but this may bias the CI towards your larger sample size if you have an extremely imbalanced data.
Note that choosing the smaller sample size for the the degrees of freedom does
Compute the standard error. \[ \begin{aligned} SE & = \sqrt{\frac{s_{horsebean}^2}{n_{horsebean}} + \frac{s_{linseed}^2}{n_{linseed}}} \\ & = \sqrt{\frac{38.63^2}{10} + \frac{52.24^2}{12}} \\ SE & = 19.4074 \end{aligned} \]
Compute the margin of error. For a confidence level of 90%, we have \(t^*_{9} = 1.8331\) \[ \begin{aligned} ME & = t^*_{24} SE \\ & = 1.8331 (19.4074) \\ ME & = 35.5757 \end{aligned} \]
The 90% confidence interval is computed as follows. \[ \begin{aligned} \bar{x}_{horsebean} - \bar{x}_{linseed} \pm ME & = 160.20 - 218.75 \pm 35.5757 \\ & = -58.55 \pm 35.5757 \\ \end{aligned} \] \[(-94.1257,-22.9743)\]
Therefore, we are 90% confident that the true difference in means is between 22.9743 and 94.1257 grams in absolute value. Here, the negative sign indicates that linseed has higher mean weight than horsebean.
Since the null value of 0 (there is no difference in mean) is not inside the 90% confidence interval, then we have a conclusion that there is a significant difference between the mean weights of chickens between groups.
If the sample sizes increases assuming the mean and standard deviation stays the same, then the 90% confidence interval becomes narrower because the margin of error decreases as the sample size increases.
If the sample sizes gets smaller then the margin of error increases and the \(t^*_{df}\) gets larger. Thus, more uncertainty when making a conclusion.
Today, we discussed the following:
Next, we will discuss:
Hypothesis testing for one or more means
Analysis of Variance (ANOVA)