Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
QBUS2820
Predictive Analytics
Question 1 ()
What is overfitting and why is it a fundamental concern in supervised learning?
Solution:
We say that there is overfitting when an estimated model is excessively flexible,
incorporating minor variations in the training data that are likely to be noise
rather than predictive patterns.
Main issues with overfitting:
• Low training error, high generalization error
• Poor predictive performance
• Overreacts to minor fluctuations in training data
Question 2 ()
What is classification and how is it different to regression?
Solution:
In classification, the response variable Y is qualitative or categorical that takes
values in a finite unordered set Y = {1, . . . , C}, where C is the number of classes.
Our task is to predict which class a subject belongs to based on input variables.
However, in regression the response variable Y is numerical.
Question 3 ()
1. What is a confusion matrix?
Solution: A confusion matrix counts the number of true negatives, false posi-
tives, false negatives, and true positives for the test data.
2. Present how the confusion matrix look like for a binary classification problem.
Classification (Prediction)
Ŷ = 0 Ŷ = 1
A
ct
ua
l
Y = 0 True negatives (TN) False positives (FP)
Y = 1 False negatives (FN) True positives (TP)
Question 4 ()
Given the below table which contains actual Y and predicted Y from two models,
calculate Sensitivity, Specificity and Precision respectively for the two models.
Solution:
Note: as mentioned in the question: “Fill in your answers to all ques-
tions 1 to 3 in the answer box. You only need to present the final
solution.” Therefore, you only need to type your answers as below:
The Sensitivity for model 1 is: 1. The Sensitivity for model 2 is: 0.333.
The Specificity for model 1 is: 0.143. The Specificity for model 2 is: 1.
The Precision for model 1 is: 0.333. The Precision for model 2 is: 1.
Solution:
The details of the solutions are shown below, while you do NOT have
to type such solving details into the answer box.
For Model 1: we have: TP= 3; TN= 1; FP= 6; FN= 0;
Page 2
Therefore, we have Sensitivity:
P (Ŷ = 1|Y = 1) = TPTP + FN =
3
3 + 0 = 1
Specificity is :
P (Ŷ = 0|Y = 0) = TNTN + FP =
1
1 + 6 = 0.143
Precision is:
P (Y = 1|Ŷ = 1) = TPTP + FP =
3
3 + 6 = 0.333
For Model 2: we have: TP= 1; TN= 7; FP= 0; FN= 2;
Therefore, we have Sensitivity:
P (Ŷ = 1|Y = 1) = TPTP + FN =
1
1 + 2 = 0.333
Specificity is :
P (Ŷ = 0|Y = 0) = TNTN + FP =
7
7 + 0 = 1
Precision is:
P (Y = 1|Ŷ = 1) = TPTP + FP =
1
1 + 0 = 1
Question 5 ()
What are the disadvantages of KNN method? Present three of them.
Solution:
The KNN method has the following disadvantageous (any three are fine):
(a) The estimate of the regression function can be very unstable, since it is the
average of only a few points. This is the price that we pay for flexibility.
(b) Curse of dimensionality.
(c) The predictive performance is sensitive to noisy or irrelevant predictors.
(d) Generating predictions is computationally expensive.
Page 3
Question 6 ()
Consider the additive error model
Y = f(x) + ε,
where ε is i.i.d. with E(ε) = 0 and Var(ε) = σ2.
We can write the expected prediction error at a new input point X = x0 as:
EPE(x0) = ED
[(
Y − f̂(x0)
)2 |X = x0]
1. Decompose the expected prediction error into two parts: irreducible error and
reducible error.
2. Further, we can show that the reducible error can be decomposed into the following
two parts: squared bias and variance. First, identify which part is the squared bias
and which part is the variance. Second, explain the meaning of each part.
E([f̂(x0)− E(f̂(x0))]2) +
[
E(f̂(x0))− f(x0)
]2
Solution:
Note: as mentioned in the question: “Fill in your answers to both
questions 1 to 2 in the answer box. You only need to present the final
solution.” Therefore, you only need to type your answers as below:
1. The irreducible error is: σ2.
The reducible error is: E
[(
f(x0)− f̂(x0)
)2 |X = x0].
2.
[
E(f̂(x0))− f(x0)
]2
is the squared bias. Bias reflects the error between ex-
pectation of f̂(x0) differs from the true function.
E([f̂(x0) − E(f̂(x0))]2) is the variance. Variance reflects the expected squared
deviation of f̂(x0) around its mean.
Solution:
Page 4
The details of question 1 solutions are shown below, while you do NOT
have to type such solving details into the answer box.
Err(x0) = E
[(
Y0 − f̂(x0)
)2 |X = x0]
= E
[(
f(x0) + ε− f̂(x0)
)2 |X = x0]
= σ2 + E
[(
f(x0)− f̂(x0)
)2 |X = x0]
= Irreducible error + Reducible error
Page 5
Question 7 ()
Given the below Gaussian linear regression:
Y = β0 +
p∑
j=1
βjXj + εi, εi ∼ N(0, σ2).
1. Show the full form of the conditional distribution of Y |X.
2. Derive the complete form log-likelihood L(β, σ2) which is the log-density of the N
observed samples.
Solution:
Note: as mentioned in the question: “Fill in your answers to both
questions 1 to 2 in the answer box. You only need to present the final
solution.” Therefore, you only need to type your answers as below:
1. The full form of the conditional distribution of Y |X is:
Y |X = x ∼ N
β0 + p∑
j=1
βjxj, σ
2
2. The complete log-likelihood is the log-density of the N observed samples is:
L(β, σ2) = −N2 log(2pi)−
N
2 log(σ
2)− 12σ2
N∑
i=1
yi − β0 − p∑
j=1
βjxij
2
Solution:
The details of the solutions are shown below, while you do NOT have
to type such solving details into the answer box.
1. Show the full form of the conditional distribution of Y |X.
The expectation of Y |X:
E(Y |X) = E(β0 +
p∑
j=1
βjXj + εi|X) = β0 +
p∑
j=1
βjXj +E(εi|X) = β0 +
p∑
j=1
βjXj + 0
Page 6
The variance of Y |X:
V ar(Y |X) = V ar(β0 +
p∑
j=1
βjXj + εi|X)
= V ar(β0|X) + V ar(
p∑
j=1
βjXj|X) + V ar(εi|X)
= 0 + 0 + σ2
There is no covariance terms in the above second step as independence assump-
tion.
Last, linear transformations of the Gaussian variables will be still following Gaus-
sian distribution.
Therefore, the full conditional distribution of Y |X is:
Y |X = x ∼ N
β0 + p∑
j=1
βjxj, σ
2
2. Derive the complete form log-likelihood L(β, σ2) which is the log-density of
the N observed samples.
For one specific sample, we haev
Yi|Xi = xi ∼ N
β0 + p∑
j=1
βjxij, σ
2
,
the density for an observed value yi is
p(yi|xi;β, σ2) = 1√2piσ2 exp
−
(yi−β0−
∑p
j=1 βjxij)
2
2σ2 .
The likelihood function is the joint PDF of the data evaluated at the sample
values. In our Gaussian linear regression model, independence assumption implies
that we can multiply the PDFs for each observation:
p(y;β, σ2) =
N∏
i=1
1√
2piσ2
exp−
(
yi−β0−
∑p
j=1 βjxij
)2
2σ2
Page 7
Therefore, the complete log-likelihood is the log-density of the N observed sam-
ples,
L(β, σ2) = log
N∏
i=1
p(yi;β, σ2) =
N∑
i=1
log p(yi;β, σ2)
=− N2 log(2pi)−
N
2 log(σ
2)− 12σ2
N∑
i=1
yi − β0 − p∑
j=1
βjxij
2
Question 8 ()
Given the below time series model:
Yt = `t−1 + εt,
`t = αyt + (1− α)`t−1,
where εt i.i.d∼ N(0, σ2) , 0 ≤ α ≤ 1.
1. Derive the error correction formulation of the model.
2. Derive the h-step ahead point forecasts ŷt+h.
3. Derive the h-step ahead variance forecasts Var(Yt+h|y1, . . . , yt).
Solution:
Note: as mentioned in the question: “Fill in your answers to all ques-
tions 1 to 3 in the answer box. You only need to present the final
solution.” Therefore, you only need to type your answers as below:
1. We have model in error correction form as:
Yt+1 = `t + εt+1,
`t = `t−1 + αεt.
2. The point forecast for any horizon h is:
ŷt+h = `t
3. The variance forecast for any horizon h is:
Page 8
Var(Yt+h|y1:t) = σ2(1 + (h− 1)α2)
Solution:
The details of the solutions are shown below, while you do NOT have
to type such solving details into the answer box.
1. Derive the error correction formulation of the model.
First, we transform the `t equation as:
`t = αYt + (1− α)`t−1
= `t−1 + α(Yt − `t−1)
= `t−1 + αεt.
Therefore, we have model in error correction form as:
Yt+1 = `t + εt+1,
`t = `t−1 + αεt.
2. Derive the h-step ahead point forecasts ŷt+h.
Using Yt = `t−1 + εt and `t = `t−1 + αεt, we have:
Yt+1 = `t + εt+1
Yt+2 = `t+1 + εt+2
= `t + αεt+1 + εt+2
Yt+3 = `t+2 + εt+3
= `t + αεt+1 + αεt+2 + εt+3
...
Yt+h = `t+h−1 + εt+h
= `t +
h−1∑
i=1
αεt+i + εt+h
Page 9
Thus the point forecast for any horizon h is
ŷt+h = E(Yt+h|y1:t)
= E
(
`t +
h−1∑
i=1
αεt+i + εt+h
∣∣∣∣∣ y1:t
)
= `t
3. Derive the h-step ahead variance forecasts Var(Yt+h|y1, . . . , yt).
Based on the derivations of the last question, we have:
Var(Yt+h|y1:t) = Var
(
`t +
h−1∑
i=1
αεt+h−i + εt+h
∣∣∣∣∣ y1:t
)
= σ2(1 + (h− 1)α2)
Question 9 ()
Given the linear regression problem, the original centered input matrix X has N
samples and p features. Suppose we argument the centered input matrix X with
p additional rows
√
λI and augment y with p zeros correspondingly. I is a p × p
identity matrix.
Now we are implementing OLS on such augmented data set. Present the parameter
estimates.
You only need to present the final solution.
Hint: you might need to use
β̂ = (XTX)−1XTy.
Solution: Note: as mentioned in the question: “You only need to
present the final solution.” Therefore, you only need to type your
answers as below:
The parameter estimates are:
(XTX + λI)−1XTy,
Page 10
Solution:
The details of the solutions are shown below, while you do NOT have
to type such solving details into the answer box.
Denote X˜ and y˜ as the augmented data sets, e.g.,
X˜(N+p)×p =
XN×p√
λIp×p
y˜(N+p)×1 =
yN×1
0p×1
The OLS solution of regression parameters estimates with the augmented data
set is:
β̂new = (X˜TX˜)−1X˜T y˜.
Firstly, we have:
X˜TX˜ = (XTp×N ,
√
λIp×p)
XN×p√
λIp×p
= XTX + λI
Secondly (as y˜ has p additional 0 rows),
X˜T y˜ = XTy
Therefore, finally we have:
β̂new = (X˜TX˜)−1X˜T y˜ = (XTX + λI)−1XTy,
END OF EXAMINATION.