Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STAT 3006 Assignment 3
Due date: 5:00 pm on 12 April
(30%)Q1: There are 100 samples {X1, X2, . . . , X100}. For each sample i, Xi = (Xi1, Xi2). You
know that these samples are from three clusters. For sample i, we use Zi to denote the cluster
number to which sample i belongs. The proportion of the three clusters is denoted by pi1, pi2, pi3.
Specifically, for each sample i, P (Zi = k) = pik, 1 ≤ k ≤ 3. Given Zi = k, Xi1 ∼ N(µ1k, 1) and
Xi2 ∼ N(µ2k, 1). Use Q1 dataset to estimate parameters (µ1k, µ2k), k = 1, 2, 3, (pi1, pi2, pi3) and
Zi(1 ≤ i ≤ 100). Note: assign Dirichlet(2, 2, 2) prior to (pi1, pi2, pi3), and assign uniform prior
p(µjk) ∝ 1 to µjk, j = 1, 2; k = 1, 2, 3; implement Gibbs sampler 10,000 iterations, and collect
the samples after 2,000 burn-in; use posterior mean to estimate µ, pi and use posterior mode
to estimate Z .
(35%)Q2: There are T = 1, 500 independent normal distributed random variables {X1, X2, . . . , XT}.
Xt(1 ≤ t ≤ T ) denotes the value we observed at time t. We know that there are two
mean-shift change points in this data stream. More specifically, there are two different
change points at time k and l (1 < k < l < T ), then {X1, . . . , Xk} ∼ Normal(µ1, σ2),
{Xk+1, . . . , Xl} ∼ Normal(µ2, σ2) and {Xl+1, . . . , XT} ∼ Normal(µ3, σ2) with known variance
σ2 = 1. We are interested in finding the change points and estimating the unknown parameters
µ1, µ2 and µ3. Please derived a (hybrid) Gibbs sampler and implement it in R using Q2
dataset to find the values of k, l, µ1, µ2 and µ3. Note: assign Normal(0, 1) priors to both
of µk, k = 1, 2, 3, and assign two-dimensional discrete prior
1
(T−22 )
I(1 < k < l < T ) to (k, l);
10,000 iterations are needed, and collect the posterior samples after 2,000 burn-in; use poste-
rior mean to estimate µk, k = 1, 2, 3, and use posterior mode to estimate the change points (k, l).
(35%)Q3: There are four candidates in an election. The election comprises of two voting stages,
morning and afternoon, and there are 100 votes for each stage. Denote yij as the vote number
of Candidate j at stage i, i = 1, 2 and j = 1, 2, 3, 4. However, an accident occurs during the
vote counting such that we only know that the vote number of some candidates are larger or
equal to 11 as the following table:
Candidate 1 Candidate 2 Candidate 3 Candidate 4
Stage I 41 ≥ 11 ≥ 11 13
Stage II 38 32 ≥ 11 ≥ 11
If we assume that Y i = (yi1, yi2, yi3, yi4), i = 1, 2 satisfies the same multinomial distribution
1
with 100 trials and event probabilities p = (p1, p2, p3, p4). Based on the available information,
please carry out a hybrid Gibbs sampler to estimate p.
Note: assign the Dirichlet prior pi(p) ∝ p1p2p3p4 to p. Regard y13 and y23 as latent variables
and update y13 by the proposal:
1. if 11 < y
(t−1)
13 < 35, then P (y
∗
13 = y
(t−1)
13 − 1
∣∣y(t−1)13 ) = P (y∗13 = y(t−1)13 + 1∣∣y(t−1)13 ) = 0.5;
2. if y
(t−1)
13 = 11, then P (y
∗
13 = 12
∣∣y(t−1)13 ) = 1;
3. if y
(t−1)
13 = 35, then P (y
∗
13 = 34
∣∣y(t−1)13 ) = 1.
Similarly, you can design a proposal distribution for y23 yourself. 10,000 iterations are needed,
and collect posterior samples in the last 8,000 iterations; use posterior mean to estimate p.
Requirements:
No. in the paper report in the R code file
Q1 Detailed derivation of Gibbs sampler R code
all estimates for µ,pi,Z
Trace plots for µ11, Z1 and pi1
Q2 Detailed derivation of (hybrid) Gibbs sampler R code
estimates for k, l, µ1, µ2, µ3
Q3 Detailed derivation of hybrid Gibbs sampler R code
the estimate for p