Alex John Quijano
12/03/2021
Today, we will discuss the following:
Linear regression with a single predictor.
Confidence intervals for linear regression.
Original data: weight
of baby as a linear model of mother’s age. Notice that the relationship between mage
and weight
is not as strong as the relationship we saw previously between weeks
and weight
.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 6.2295 | 0.708 | 8.79 | <0.0001 |
mage | 0.0361 | 0.024 | 1.50 | 0.1362 |
The population model is: \(y_{weight} = \beta_0 + \beta_1 x_{age} + e\) where \(y\) is the response, \(x\) is the predictor, \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(e\) is the error term.
The least squares regression model uses the data to find a sample linear fit: \(\hat{y}_{weight} = b_0 + b_1 x_{age}.\) where \(b_0 = 6.2295\), \(b_1 = 0.0361\).
Repeated bootstrap resamples of size 100 are taken from the original data. Each of the bootstrapped linear model is slightly different.
Standard error of the slopes is approximately \(0.0225\).
For a 95% confidence interval, \(z^* = 1.96\). Why \(z^*\)? We are assuming we have large enough samples (100 bootstrapped samples). You can still use \(t^*_{99} = 1.98\) for \(df = 99\), which it will give you a wider CI accounting for “low” sample size.
The confidence interval ius then given by \[b_1 \pm 1.96 \cdot SE \rightarrow 0.036 \pm 1.96 \cdot 0.0225 \rightarrow (-0.0081, 0.0801).\]
We are 95% confident that for the model describing the population of births, described by mother’s age and weight
of baby, a one unit increase in mage
(in years) will be associated with an increase in predicted average baby weight
of between \(-0.0081\) and \(0.0801\) pounds
The point estimate is \(b_1 = 0.0361\) and the standard error is \(SE = 0.024\).
The degrees of freedom , \(df = 100-99\) with \(t_{99}^* = 1.98\).
We can now construct the confidence interval in the usual way:
\[ \begin{aligned} \text{b_1} &\pm t_{99}^* \times SE \\ 0.0361 &\pm 1.98 \times 0.024 \\ (-0.0114,0.0836) \end{aligned} \]
We are 95% confident that a one unit increase in mage
(in years) will be associated with an increase in predicted average baby weight
of between \(-0.0114\) and \(0.0836\) pounds.
Consider the following least squares output.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 24319.33 | 1291.45 | 18.83 | <0.0001 |
family_income | -0.04 | 0.01 | -3.98 | 2e-04 |
The sample linear fit is \(\hat{y}_{aid} = b_0 + b_1 x_{income}\) where \(b_0 = 24319.33\), \(b_1 = -0.04\).
Construct a 90% confidence interval for the true slope of the linear model and interpret it in context.
Considering the p-value shown in the table, what is the hypothesis testing conclusion? What significance value did you use, and does it matter?
Does your 90% confidence interval conclusion agree with your hypothesis testing conclusion?