Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Midterm Exam
THE EXAM CONTAINS 4 QUESTIONS.
You will have 75 minutes to complete the exam.
You will need to have your CAMERA ON in Zoom for the whole duration of the exam. You will
need to have your MIC OFF for the whole duration of the exam.
During this Exam, you can review the Seber & Lee, Kutner, and Faraway texts, your course notes, the
lecture videos and lecture notes, and your homework and homework solutions. You are permitted to
use a calculator and, where appropriate, R. You are not allowed to consult with other people,
share or discuss Exam topics or Exam questions during the examination period, nor to
use any educational resources other the ones listed above.
If you have technical issues when submitting the exam to Gradescope, email me the screen-
shots/images of all your work before the deadline. Then submit the exam once you sort
out the technical issues.
If you have questions during the exam, you can send a private message to the TA monitoring you in
the Zoom chat. Do not unmute yourself to speak.
By submitting this Exam, you certify that 1) you understand the rules and agree to abide by them;
and 2) you understand that violating these rules constitutes academic dishonesty.
Problem Max Points Points
1 30
2 25
3 30
4 20
Total 105
1
Question 1 (30 points). THE QUESTION HAS 4 PARTS: (a)-(d)
Consider the linear model
Y = Xβ + ε,
where Y is a n-by-1 vector of response variables, X is an n-by-p design matrix, β is a p-by-1 vector of
coefficients, ε is a multivariate normal Nn(0, σ
2In).
(a) (9pts) Write down the distributions with the corresponding means and variance-covariance matrices of
the following random vectors. You do not need to justify what you wrote.
(i) Y
(ii) Yˆ
(iii) e
(b) (12pts) Find Cov(e, Yˆ). Show all the steps and simplify as much as possible.
(c) (3pts) What are the dimensions of the matrix Cov(e, Yˆ)?
(d) (6pts) Does your answer to (b) change if only Gauss-Markov conditions are met? If yes, state the
modified results, if no, explain why they do not change.
2
Question 2 (25 points). THE QUESTION HAS 2 PARTS: (a)-(b)
A beverage company is currently interested in finding the effect of milkshakes and other drinks on weight
gain. The company performs a designed study in which X is the amount of milkshakes (in pints) consumed
per week, Z is the amount of soda (in pints) consumed per week, and W is the amount of coffee (in pints)
consumed per week. The response variable Y is the weight gain in kilogram after one month. The number
of data points is n = 50. The company then runs a linear regression according to the model
Yi = β0 + β1Wi + β2Xi + β3Zi + ϵi
where the error terms ϵi are independent normal random variables with mean 0 and constant variance σ
2
(σ2 is unknown).
The design matrix
X =
1 W1 X1 Z1
1 W2 X2 Z2
...
...
...
...
1 Wn Xn Zn
satisfies
(XTX)−1 =
0.9 0.05 −0.14 −0.18
0.05 0.04 −0.02 −0.05
−0.14 −0.02 0.03 0.04
−0.18 −0.05 0.04 0.08
The following estimates were computed from the data
βˆ0 = 1.29; βˆ1 = 0.82; βˆ2 = 1.17; βˆ3 = 1.43√
MSE = 0.22, SSTO =
∑
i
(Yi − Y¯ )2 = 307.56.
From the above output, answer the following question.
(a) (10pts): A confidence interval for the mean response E[Yj ] could be written as
aTj βˆ ± dj
Specify vectors aj and scalars dj for simultaneous 95% confidence intervals for E[Yj ] for all choices
of the predictor variables Wj = 0.5, Xj ∈ {1, 2, 3} and Zj ∈ {3, 5}. That is to say, your 95% confidence
intervals for E[Yj ] should cover the cases where (Wj , Xj , Zj) = (0.5, 1, 3), (Wj , Xj , Zj) = (0.5, 1, 5), . . . ,
(Wj , Xj , Zj) = (0.5, 3, 5) simultaneously. You DO NOT need to evaluate dj fully, just show the formula
and specify all the values involved.
3
(b) (15pts) To test the following hypothesis
H0 : β2 = β3 = 0 against HA : β2 ̸= 0 or β3 ̸= 0.
at a significance level of α = 0.01, we can use the following statistic
F =
(Cβˆ − γ)⊤(C(X⊤X)−1C⊤)−1(Cβˆ − γ)
SSE
×
What is the distribution of your test statistic (specify the degrees of freedom if appropriate)? Specify
explicitly the numerical values forC, γ, SSE, and fill in the dashed boxes with appropriate numerical
values. You don’t have to conduct the test.
4
Question 3 (30 points). THE QUESTION HAS 5 PARTS: (a)-(e)
The R output is given below. Answer the following questions.
> lmod=lm(y~x1+x2+x3+x4+x5,data=df)
> summary(lmod)
Call:
lm(formula = y ~ x1 + x2 + x3 + x4 + x5, data = df)
Residuals:
Min 1Q Median 3Q Max
-15.2743 -5.2617 0.5032 4.1198 15.3213
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.91518 10.70604 6.250 1.91e-07 ***
x1 -0.17211 0.07030 -2.448 0.01873 *
x2 -0.25801 0.25388 -1.016 0.31546
x3 -0.87094 0.18303 -4.758 2.43e-05 ***
x4 0.10412 0.03526 2.953 0.00519 **
x5 1.07705 0.38172 2.822 0.00734 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.165 on 41 degrees of freedom
Multiple R-squared: 0.7067,Adjusted R-squared: 0.671
F-statistic: 19.76 on 5 and 41 DF, p-value: 5.594e-10
(a) (4 points) How many data points are there (what is n, the sample size)? What is p, number of unknown
parameters?
(b) (5 points) What is the result of the overall model fit test, at level α = 0.05? State the null and alternative
hypotheses and the result of the test.
5
(c) (5 points) Based on all of the R output, do you reject the null hypothesis that β2 = 0 at level α = 0.05
significance? Should you conclude that the model y = β0 + β2x2 is not an appropriate model for the
data? Explain why you should or why you shouldn’t.
(d) (6 points) Compute SSE from the R output.
(e) (10 points) Based on the R output provided, find (XTX)−111 , in other words, find the first diagonal
element of the inverse of the matrix XTX.
6
Question 4. (20 pts total) THE QUESTION HAS 3 PARTS: (a)-(c)
(a) (12pts) Write the formulas for internally studentized residuals and externally studentized residuals.
Explain each term being used.
(b) (4pts) What do we use internally studentized residuals for? What do we use externally studentized
residuals for?
(c) (4pts) Can we compute externally studentized residuals based on the output provided in Question 3?
Explain why yes, or why no. You do not need to compute anything.