Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MAT 3375 – Regression Analysis – Questions
1. (a) Let Ui ∼ χ2(ri) be independent random variables with r1 = 5, r2 = 10. Set
X =
U1/r1
U2/r2
.
Using R, find s and t such that
P (X ≤ s) = .95 and P (X ≤ t) = .99.
(b) Let Z ∼ N(0, 1) and U ∼ χ2(10) be two independent random variables. Let
V =
Z√
U/10
.
Using R, find w such that P (V ≤ w) = 0.95.
2. Let f : Rn → R, v ∈ Rn, and a ∈ R. Define f(Y) = Y⊤v + a. Find the gradient of f with
respect to Y. Write a function in R that computes f(Y) given v, a. Evaluate the function at
Y = (1, 0,−1), for v = (1, 2,−3) and a = −2.
Note: in the course, we will write vectors either as columns format or as rows, in a more or
less arbitrary way. It is up to you to determine which one makes the dimensions compatible.
3. Let A =
(
1 1 0
0 1 −1
)
, µ = (1, 0, 1), Σ =
2 −1 0−1 1 0
0 0 1
, and Y ∼ N (µ,Σ).
Let W = AY. What distribution does the random vector W follow? Draw a sample of size
100 for this random vector with R and plot them in a graph. Note: you may use the function
mvrnorm() from the MASS package to help along (but you do not have to).
4. Let Y ∼ N (0, 9I4) and set Y = 14(Y1 + Y2 + Y3 + Y4). Using R, draw 1000 observations from:
(a) Y 21 + Y
2
2 + Y
2
3 + Y
2
4
(b) 4Y
2
(c) (Y1 − Y )2 + (Y2 − Y )2 + (Y3 − Y )2 + (Y4 − Y )2
In each case, plot a histogram of the observations.
5. Consider the function f : R3 → R defined by
f(Y) = Y 21 +
1
2Y
2
2 +
1
2Y
2
3 − Y1Y2 + Y1 + 2Y2 − 3Y3 − 2.
Using R, find the critical point(s) of f . If it is unique, does it give rise to a global maximum
of f? A global minimum? A saddle point?
6. (a) Identify the response variable Y and the predictor variable X in each of the examples
shown on slides 4 and 5 of the course notes (Chapter 2). Is there a linear relationship
between X and Y . Draw the approximate line of linear fit (and give its equation).
Hint: use screenshots and software (Paint, PowerPoint, GIMP, etc.) to overlay the line.
(b) Consider the 4 examples shown on page 9 of the course notes (chapter 2). Is the variance
of the error terms constant? Are the error terms independent of each other?
1
7. Consider the dataset Autos.xlsx found on Brightspace. The predictor variable is VKM.q (X,
the average daily distance driven, in km); the response variable is CC.q (Y , the average daily
fuel consumption, in L). Use R to:
(a) display the scatterplot of Y versus X;
(b) determine the number of observations n in the dataset;
(c) compute the quantities
∑
Xi,
∑
Yi,
∑
X2i ,
∑
XiYi,
∑
Y 2i ;
(d) find the normal equations of the line of best fit;
(e) find the coefficients of the line of best fit (without using lm()), and
(f) overlay the line of best fit onto the scatterplot.
8. (continuation of the previous question) Use the R function lm() to obtain the coefficients of
the line of best fit and the residuals. Show (by calculating the required quantities directly)
that the first 5 properties of residuals (p.25 in the course notes of Chapter 2) are satisfied.
9. (continuation of the previous question) Using R, compute the Pearson and Spearman corre-
lation coefficients between the predictor and the response. Is there a strong or weak linear
association between these two variables? Use the correlation values and diagrams to justify
your answer.
10. (continuation of the previous question) Using R, find the decomposition into sums of squares
for the regression.