Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP0080 - Graphical Models
LEVEL: : Postgraduate
TIME : 10:00
Controlled Condition Exam: 3 Hours exam
You cannot submit your work after the date and time shown on
AssessmentUCL – you must ensure to allow sufficient time to upload and
hand in your work
This paper is suitable for candidates who attended classes for this
module in the following academic year(s):
Year
2021/22
Additional material
Special instructions
Exam paper word
count
TURN OVER
N/A
N/A
N/A
Graphical Models
Final Exam
COMP0080 2021–22
Suitable for prior cohorts: 2016–17, 2017–18, 2018–19, 2019–20,
2020–21
Always provide justification and show any intermediate work for your answers.
A correct but unsupported answer may not receive any marks.
Page 1 of 5 COMP0080 2021–22
1. Independence in Graphical Models
(a) Suppose that X1 and X2 are two random variables, where each can take one of
three states. Here is a probability table P describing their joint distribution,
Pij = p(X1 = i,X2 = j):
P =
0.06 0.37 0.120.01 0.07 0.02
0.03 0.26 0.06
.
Are X1 and X2 independent? Support your answer and if they are not inde-
pendent, what is the minimal number of entries in the table that you need to
change to make them independent? [6 marks]
(b) Consider three random variables A, B, C. Give an example of a distribution in
which A ⊥⊥ B, B ⊥⊥ C, C ⊥⊥ A, but A and B are not conditionally independent
when given C. [6 marks]
(c) For the distribution from the previous question, draw the graphical model with
which this distribution is compatible and which captures as many independence
statements from the previous question as possible. [6 marks]
(d) Consider the following Markov Network on the variables Xi:
X1
X2
X3
X4
X5X6
Write down the set of all the conditional and marginal independence statements
that are true for the marginal distribution P (X2, X4, X6). [6 marks]
(e) Consider the following two graphical models:
A
B C
D
E
A
B C
D
E
Page 2 of 5 COMP0080 2021–22
Do they encode the same set of independence assumptions? If yes, explain why.
If not, list all the independence statements that are true in one of the models,
but not in the other. [6 marks]
[Total: 30 marks]
Page 3 of 5 COMP0080 2021–22
2. Inference and Learning
(a) Consider a model of diseases and symptoms. si ∈ {0, 1} is a binary random
variable indicating whether the patient is showing i-th symptom and dj ∈ {0, 1}
is a binary random variable indicating whether the patient has j-th disease. A
model for this is given by
p(s,d) =
1
Z
exp(sTWd + aT s + bTd),
where Z is the normalisation constant and W, a,b are the parameters of the
model.
i. Draw a Markov Network for this model. [6 marks]
ii. Derive the expression for p(s|d). Is this distribution factorised? [6 marks]
iii. There is a set of N patients, each with a patient record (sn,dn). Suppose
that you want to learn the parameters of the model by Maximum Like-
lihood. Derive the expression for the log-likelihood L, assuming that the
patient records are i.i.d. [6 marks]
iv. Derive the expressions for ∂L∂Wij ,
∂L
∂ai
, ∂L∂bi . [6 marks]
v. Could these derivatives be computed efficiently in the general case? Justify
your answer. [6 marks]
(b) Consider Hidden Markov Model with hidden states h1:T = {h1, . . . , hT } and
observed states v1:T = {v1, . . . , vT }.
i. When the sequence of outcomes v1:T is observed, it induces the distribution
on the hidden states pv(h1:T ) = p(h1:T |v1:T ). What is the graphical model
of this distribution? Based on the graphical model, is h1 independent of
hT in this distribution? [6 marks]
ii. Suppose you want to sample from pv(h1:T ). Is it possible to efficiently
sample from it? If it is possible, how would you do it? If not, explain why.
[6 marks]
iii. Let vt ∈ {0, 1, 2}, ht ∈ {0, 1} and the parameters of the model are as
follows. p(h1 = 1) = 0.5, the transition matrix T =
(
0.5 0.8
0.5 0.2
)
and the
emission matrix B =
0.3 0.50.6 0
0.1 0.5
. Suppose that you observe the sequence
of outcomes v1:10 = [0, 1, 0, 2, 0, 2, 1, 0, 2, 0]. Is h1 independent of h10 in
pv(h1:10) given this particular sequence? Why is that? [6 marks]
iv. For the same parameter settings and the same outcome sequence, compute
the filtering distributions p(h8|v1:8) and p(h9|v1:9). Hint: you do not need
to run the full Forward algorithm here. [6 marks]
[Total: 54 marks]
Page 4 of 5 COMP0080 2021–22
3. Sampling
(a) Explain what KL-divergence is. Suppose P (X) = (14 ,
1
4 ,
1
4 ,
1
4) , Q(X) = (
1
2 ,
1
4 ,
1
8 ,
1
8),
P (Y = i|X = j) = Pij and Q(Y = i|X = j) = Qij where:
P =
0.25 0.5 0 0.2
0.25 0 0.5 0.3
0.25 0 0 0.1
0.25 0.5 0.5 0.4
, Q =
0.1 0 0 1
0 0 0.5 0
0.9 0.5 0 0
0.0 0.5 0.5 0
.
Compute KL(P (Y )||Q(Y )) and KL(Q(Y )||P (Y )). [5 marks]
(b) Consider Markov networks on a rectangular M -by-N lattice with each Xi a
binary random variable and
P (X) ∝
∏
i∼j
Pij(Xi, Xj)
where each Pij(Xi, Xj) = e
−I(Xi 6=Xj) and
∏
i∼j indicates the product over all
the edges. Suppose you want to use rejection sampling for trying to get samples
from this distribution. Is this a good idea? Compute the average acceptance
rates for the following pairs of M and N .
i. M = 2, N = 1
ii. M = 2, N = 2
[5 marks]
(c) Explain the procedure of MCMC sampling and what is meant by a detailed
balance condition. Is detailed balance a necessary condition for a sampler? Try
to come up with a distribution P and a valid MCMC sampler for which the
detailed balance condition does not hold. [6 marks]