Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_w2
ST3189
Machine Learning
Suitable for all candidates
Instructions to candidates
This paper contains four questions. Answer ALL FOUR. All questions will be given
equal weight (25%).
The marks in brackets reflect marks for each question.
Time allowed - Reading Time: None
Writing Time: 2 hours
You are supplied with: Graph paper
You may also use: No additional materials
Calculators: Calculators are allowed in this examination
1. (a) Suppose that yi ∼ N(µ, 1) for i = 1, . . . , n and that the yi’s are independent.
i. Show that the sample mean estimator µˆ1 =
which can be recognised as the IGamma(100.1, 20.1) distribution.
(b) Find the Jeffreys’ prior for λ. Which is the corresponding posterior distribu-
tion? [6 marks]
Hence Jeffreys’ prior is pi(λ) ∝ I(λ)1/2 ∝ (λ−2)1/2 = λ−1. The posterior
becomes
pi(θ|x) ∝ λ−100 exp
(
−20
λ
)
λ−1 = λ−100−1 exp
(
−20
λ
)
which can be recognised as the IGamma(100, 20)
(c) Find a Bayes estimator for λ based on the priors of parts (a) and (b). [3 marks]
Answer: A standard Bayes estimator is the posterior mean which is equal to
(see appendix)
20.1
100.1− 1 = 0.203
or
20
100− 1 = 0.202
depending on the chosen prior.
4
(d) Let y represent a future observation from the same model. Find the predictive
distribution of y based either on the prior of part (a) or (b). [6 marks]
(e) Describe how you can calculate the mean the of the predictive distribution in
software such as R. [5 marks].
Answer: Note that we can write the mean of the predictive distribution as
E(y|x) =
∫ ∞
0
yf(y|λ)pi(λ|x)dλ.
Hence a Monte Carlo scheme would draw samples y{k}, k = 1, . . . , N from
f(y|λ)pi(λ|x) for some large N and then just take
Ê(y|x) =
∑N
k=1 y
{k}
N
,
To that in R once can
i. Draw -say- 10,000 samples from the Gamma(100,20) using nu=rgamma(100,20)
.
ii. Invert those samples to make them samples from the IGamma(100,20)
using lambda=1/nu.
iii. Using each of the samples in lambda, sample y by the model y ∼Exponential(lambda)
using y=rexp(lambda).
iv. Calculate the sample mean of the values in y using mean(y).
5
3. (a) i. Suppose a non-linear model that can be written as
Y = f(X) + ,
where has zero mean and variance σ2, and is independent of X. Show
that the expected test error, conditional on X can be decomposed into the
following three parts:
E
[(
Y − fˆ(X)
)2]
= σ2 + Bias [f(x)]2 + Var [f(x)] ,
where f(·) is estimated from the training data. [7 marks]