14 - Inference for Linear Regression
Confidence Interval

Alex John Quijano

12/03/2021

Previously on Statistics…

Hypothesis testing linear regression.

Inference on Linear Regression

Today, we will discuss the following:

Linear regression with a single predictor.
Confidence intervals for linear regression.

Baby Weights (1/2)

Original data: weight of baby as a linear model of mother’s age. Notice that the relationship between mage and weight is not as strong as the relationship we saw previously between weeks and weight.

Baby Weights (2/2)

The least squares estimates of the intercept and slope are given in the estimate column. The observed slope is 0.036
term	estimate	std.error	statistic	p.value
(Intercept)	6.2295	0.708	8.79	<0.0001
mage	0.0361	0.024	1.50	0.1362

The population model is: \(y_{weight} = \beta_0 + \beta_1 x_{age} + e\) where \(y\) is the response, \(x\) is the predictor, \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(e\) is the error term.

The least squares regression model uses the data to find a sample linear fit: \(\hat{y}_{weight} = b_0 + b_1 x_{age}.\) where \(b_0 = 6.2295\), \(b_1 = 0.0361\).

CI by Bootstrapping (1/3)

CI by Bootstrapping (2/3)

Repeated bootstrap resamples of size 100 are taken from the original data. Each of the bootstrapped linear model is slightly different.

CI by Bootstrapping (3/3)

Standard error of the slopes is approximately \(0.0225\).
For a 95% confidence interval, \(z^* = 1.96\). Why \(z^*\)? We are assuming we have large enough samples (100 bootstrapped samples). You can still use \(t^*_{99} = 1.98\) for \(df = 99\), which it will give you a wider CI accounting for “low” sample size.
The confidence interval ius then given by \[b_1 \pm 1.96 \cdot SE \rightarrow 0.036 \pm 1.96 \cdot 0.0225 \rightarrow (-0.0081, 0.0801).\]
We are 95% confident that for the model describing the population of births, described by mother’s age and weight of baby, a one unit increase in mage (in years) will be associated with an increase in predicted average baby weight of between \(-0.0081\) and \(0.0801\) pounds

CI by Theoretical Model

The point estimate is \(b_1 = 0.0361\) and the standard error is \(SE = 0.024\).
The degrees of freedom , \(df = 100-99\) with \(t_{99}^* = 1.98\).

We can now construct the confidence interval in the usual way:

\[ \begin{aligned} \text{b_1} &\pm t_{99}^* \times SE \\ 0.0361 &\pm 1.98 \times 0.024 \\ (-0.0114,0.0836) \end{aligned} \]

We are 95% confident that a one unit increase in mage (in years) will be associated with an increase in predicted average baby weight of between \(-0.0114\) and \(0.0836\) pounds.

10.10-Minute Activity

Consider the following least squares output.

Summary of least squares fit for the Elmhurst College data, where we are predicting the gift aid by the university based on the family income of students (n = 50).
term	estimate	std.error	statistic	p.value
(Intercept)	24319.33	1291.45	18.83	<0.0001
family_income	-0.04	0.01	-3.98	2e-04

The sample linear fit is \(\hat{y}_{aid} = b_0 + b_1 x_{income}\) where \(b_0 = 24319.33\), \(b_1 = -0.04\).

Construct a 90% confidence interval for the true slope of the linear model and interpret it in context.
Considering the p-value shown in the table, what is the hypothesis testing conclusion? What significance value did you use, and does it matter?
Does your 90% confidence interval conclusion agree with your hypothesis testing conclusion?

14 - Inference for Linear Regression Confidence Interval