STA 4234/STA 5236 Regression Analysis
Regression Analysis
STA 4234/STA 5236
Regression Analysis
Homework 3
Instructions:
a. Must show all necessary work to get full credit.
b. Present each problem in order.
c. Attach relevant intermediate calculation, output, figures and/or tables to each corresponding
question.
d. You may use any software for all parts of this homework, unless stated otherwise. Attach
program code, if any, to the end of the homework (code only, output attached to each
question).
Problem #1. Show the hat matrix H=X’(X’X)-1X is symmetric (i.e. H’=H) and idempotent
(i.e. H2=H). If you cannot show in mathematics, verify using the first three observations
in any of the data sets used in this homework. Notice X should include a column of 1s.
(Stat grads are required to show in both ways.)
Problem #2. Problem 4.2 page 165. Data in Table B.1 page 554. Consider the multiple
linear regression model relating the number of games won to the team's passing
yardage (x2), the percentage of rushing plays (x7), and the opponents’ yards rushing
(x8).
Problem #3. Consider the clathrate formation data in Table B.8 page 560. Fit two
different models
(i) based on x1 and x2
(ii) based on only x2.
For each model,
(a) Construct Normal residual plots and residual vs fitted value plots. Comment on
any issue regarding model adequacy.
(b) Perform the appropriate lack of fit test and make a conclusion. [For multiple
regression, the ANOVA type model can be fit in R using the formula
y~factor(x1+x2)]. If a lack of fit test cannot be performed, explain why.
(c) Compute the PRESS statistic and R2 for prediction. Based on the result, which
model is more likely to provide better predictions of future data?