Multiple Linear Regression Models
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Multiple Linear Regression Models -
Residual Diagnostics, Unusual observations
STAT3022
Applied linear models
Regression Diagnostics
Background
Recall the MLR model
y = Xβ + ε, E(y) = Xβ, Var(y) = Var(ε) = σ2In
Assuming the design matrix X is full-ranked, so the OLS estimate
is
βˆ = (X>X)−1X> y .
The vector of fitted value and residual are
yˆ = X βˆ = X(X>X)−1Xy = Hy,
e = y−yˆ = y−Hy = (In−H)y
where H = X(X>X)−1X> is the n× n hat matrix.
1
Background
Similar to model diagnostics for SLR, diagnostic for MLR is based
on the residuals, which depends critically on the hat matrix H.
• H is symmetric, i.e H> = H. As a result, the matrix In −H
is also symmetric.
• Next, HX = X. As a result, (In−H)X = X−X = 0.
• Third, H2 = H, so we say H is idempotent. As a result, the
matrix In −H is also idempotent, since
(In −H)(In −H) = InIn −H In − In H + H H
= In −H−H + H = In −H .
• Finally, as proved in the Tutorial 4, trace(H) =
∑n
i=1 hii = p.
2
Residual vector
• First, let’s compute its expectation:
E(e) = E {(In−H)y} = (In−H)E(y) = (In−H)Xβ = 0.
• Second, let’s compute the variance-covariance matrix.
Var(e) = Var {(In−H)y} = (In−H) Var(y)(In−H)>
= (In−H)σ2 In(In−H) = σ2(In−H)(In−H)
= σ2(In−H),
i.e Var(ei) = σ
2(1− hii), Cov(ei, ej) = −σ2hij .
These computation tell us that (1) each residual term ei has a
smaller variance than the true error εi, and (2) these residuals are
correlated.
3
Residuals plots
We can use similar residual plots similar to in the case of simple
linear regression for model diagnostics. Specifically,
• To check constant variance assumption: Use the plot of
residual ei vs. fitted values yˆi or the plot of residual vs. each
covariate. no news is good news.
• To check normality assumption: Use normal quantile-quantile
plot, or normality test.