Alex John Quijano
12/01/2021
Today, we will discuss the following:
Linear regression with a single predictor.
Hypothesis testing for linear regression.
The population model is: \[y_{revenue} = \beta_0 + \beta_1 x_{advertising} + e\] where \(y\) is the response, \(x\) is the predictor, \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(e\) is the error term.
The least squares regression model uses the data to find a sample linear fit: \[\hat{y}_{revenue} = b_0 + b_1 x_{advertising}.\] where \(b_0 = 11.23\), \(b_1 = 4.8\).
A second sample of size 20 also shows a positive trend!
Consider data births gathered originally from the US Department of Health and Human Services. The births14
data can be found in the openintro R package. We will work with a random sample of 100 observations from these data.
We want to predict the baby weight based on number of weeks. The population linear model is \[y_{weight} = \beta_0 + \beta_1 x_{weeks} + e\]
The relevant hypotheses for the linear model setting can be written in terms of the population slope parameter. Here the population refers to a larger population of births in the US.
weight
and weeks
.weight
and weeks
.Linearity. The scatterplot of the explanatory and response must be nearly linear.
Independent Observations. The samples must be independent.
Normally Distributed Residuals. The errors must show a nearly normal distribution.
Constant or equal variability. The error must exhibit homoscedasticity.
term | estimate | std.error |
---|---|---|
(Intercept) | -5.72 | 1.61 |
weeks | 0.34 | 0.04 |
The least squares regression model uses the data to find a sample linear fit: \[\hat{y}_{weight} = -5.72 + 0.34 x_{weeks}.\]
R code:
where the data frame births14_100
is a subset of the original births14
data.
Two different permutations of the weight
variable with slightly different least squares regression lines.
Histogram of slopes given different permutations of the weight
variable. The vertical red line is at the observed value of the slope, 0.335.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -5.716 | 1.6137 | -3.54 | 6e-04 |
weeks | 0.335 | 0.0416 | 8.07 | <0.0001 |
\[T = \frac{b_1 - \text{null value}}{SE} = \frac{0.335 - 0}{0.0416} = 8.0529\]
Consider the following least squares output.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -29.90 | 7.79 | -3.84 | 0.0012 |
perc_pov | 2.56 | 0.39 | 6.56 | <0.0001 |
Here, we model the murders per mile based on the poverty level.
Write the linear equation for the population model and the estimated linear model.
What are the hypotheses for evaluating whether the slope of the model predicting annual murder rate from poverty percentage is different than 0?
State the conclusion of the hypothesis test from part (2) in context of the data. What does this say about whether poverty percentage is a useful predictor of annual murder rate?