Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ST 311 Assignment 2 (theory part)
Candidate number:
Instruction: Attempt all questions. The total marks is 45.
1. For the binary classification task. Given an input-output pair (x, d), where the in-
put x = (x1, x2)
T and the desired output d is a 2-dimensional one-hot vector (0, 1)T.
Consider the following MLP (without bias):
• Input layer: y(0)1 = x1 and y
(0)
2 = x2.
• Hidden layer with 2 neurons: z(1)j =
∑2
i=1w
(1)
ij y
(0)
i for j = 1, 2 and y
(1)
j = f1(z
(1)
j )
for j = 1, 2, where f1(·) is a RELU function (scalar activation).
• Output layer with 2 neurons: z(2)j =
∑2
i=1w
(2)
ij y
(1)
i for j = 1, 2 and y = (y1, y2)
T =
(y
(2)
1 , y
(2)
2 )
T = f2(z
(2)
1 , z
(2)
2 ), where f2(·, ·) is a softmax function (vector activation).
Let Div = Div(y, d) denotes the K-L divergence between the actual output y and
the desired output d. Suppose that in the forward pass, we have computed {z(1)i }i=1,2,
{z(2)i }i=1,2,{y(1)i }i=1,2, {y(2)i }i=1,2, {w(1)ij }i,j=1,2 and {w(2)ij }i,j=1,2. Using the backpropaga-
tion procedure, compute the derivatives
∂Div
∂w
(2)
11
,
∂Div
∂w
(2)
12
,
∂Div
∂w
(1)
11
,
∂Div
∂w
(1)
21
.
(You can directly make use of the results in Lecture 3’s slides) [24 marks].
2. Given N inputs x1, . . . , xN , the affine function computes zi =
∑N
j=1wjixj for i =
1, . . . , B within each batch. In stage 1 of batch normalization,
µB =
1
B
B∑
i=1
zi, σ
2
B =
1
B
B∑
i=1
(zi − µB)2, ui = zi − µB√
σ2B + ϵ
, i = 1, . . . , B.
In stage 2 of batch normalization ẑi = γui+β. Finally, the activation function computes
outputs yi = f(ẑi) for i = 1, . . . , B.
1
(a) Let Loss = Loss(y1, . . . , yB). Suppose that in the forward pass, we have computed
{zi}1≤i≤B, {wji}1≤i≤B,1≤j≤N , γ, β, {ui}1≤i≤B, {ẑi}1≤i≤B. Suppose that in the back-
ward pass, we have computed {dLoss
dyi
}1≤i≤B. Using the backpropagation procedure,
compute dLoss
dγ
, dLoss
dβ
and dLoss
dui
[6 marks]
(b) Show that
dLoss
dzi
=
1
(σ2B + ϵ)
1/2
dLoss
dui
− 1
B(σ2B + ϵ)
1/2
∑
j
dLoss
duj
− 1
B(σ2B + ϵ)
3/2
∑
j
dLoss
duj
(zi − µB)2, i = 1, . . . , B.
[12 marks]
(c) Based on results of parts (a) and (b), compute {dLoss
dwji
}1≤i≤B,1≤j≤N . [3 marks]