link to source

XKCD comic


Exercise

Today we’re going to use the built-in mtcars dataset to practice simple linear regression. Note this is a built-in dataset provided as part of the datasets package in R.

Background

Run ?(mtcars) in the console (do NOT add it to this Rmd file) and briefly read the help page. Specifically, take note of the following:

  1. What is the source of this data?
  2. What is this dataset measuring? (i.e. what is the response variable?)
  3. What predictors are available and what do they mean?

Feel free to also run head(mtcars, 10) or View(mtcars) to inspect the data frame briefly before moving on.

Fitting

Uncomment the line below and finish it. Specifically, use lm to run a regression of mpg on one other predictor (an easy way to do this is to use mpg ~ var where var is the predictor you’re using). Make sure to also include data = mtcars as an argument or it won’t know where to get the variable names from.

# lm.mtcars = lm(...)

View a summary of the regression by uncommenting and running the line below

# summary(lm.mtcars)

Briefly inspect the residuals plot by running plot(lm.mtcars,which=1:2) . What do you observe, and what does it mean?

REPLACE TEXT WITH RESPONSE

Interpretation

Uncomment the line below to get the estimated coefficients along with their standard errors.

# summary(lm.mtcars)$coefficients[,1:2]

Give an interpretation of the estimate and standard error for your predictor. Be careful in your wording of the interpretation.

REPLACE TEXT WITH RESPONSE

What does the intercept here mean? (Except for special situations, we generally don’t care much about the intercept, but you should still understand what it means.)

REPLACE TEXT WITH RESPONSE

What is the R² for this model? (Hint: look at the output of summary) Give an interpretation of this value.

REPLACE TEXT WITH RESPONSE

Briefly read about the adjusted R² here. What is the adjusted R² of this model and how does this differ from the normal R² value? (Hint: again, look at the output of summary).

REPLACE TEXT WITH RESPONSE

Generate \(95\%\) confidence intervals for the coefficients using the confint function. Give an interpretation of these confidence intervals.

# confint(...)

REPLACE TEXT WITH RESPONSE

Try with others!

Repeat the steps above for at least 1 other predictor. Which of these two predictors seems to offer a better “predictive” ability for mpg? How do you know?