COMP3670/6670: Introduction to Machine Learning
Introduction to Machine Learning
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP3670/6670: Introduction to Machine Learning
Exercise 1 Orthogonal Projections (3 + 3 + 3 + 4 + 6 + 3 credits)
Consider the Euclidean vector space R3 with the dot product. A subspace U ⊂ R3 and vector x ∈ R3 are
given by:
U = span
−1
1
1
,
2
−1
−2
,x =
8
4
16
1. Show that x ̸∈ U .
2. Determine the orthogonal projection of x onto U , denoted πU (x).
3. Determine the distance d(x, U) := miny∈U ∥x− y∥, where ∥ · ∥ denotes the Euclidean norm.
4. Use Gram-Schmidt orthogonalization to transform the matrix A =
−1 2
1 −1
1 −2
into a matrix B
with orthonormal columns.
5. Let Q ∈ Rm×n be a matrix with orthonormal columns and x ∈ Rm be an m-dimensional vector.
Find the vector θ that minimizes ||x−Qθ||2 + λ||θ||2, where λ is a positive real number.
6. Compute the vector θ for the matrix B and λ = 10.
Exercise 2 Vector calculus practices (6 + 8 + 8 credits)
Compute the following gradients over x or X. Represent the result in numerator layout. Note that you
are only allowed to use the rules demonstrated in the lecture. Show each step clearly.
1.
∂xTABC x
∂x
2.
∂(Bx+ b)
T
C(Dx+ d)
∂x
3.
∂ tr(X2)
∂X
Exercise 3 Concavity of a function (8 + 10 + 10 credits)
A function f : Rn → R with a convex domain is called a concave function if and only if its Hessian
H = ∂
2f
∂x2 is negative semidefinite. Consider the following function:
f(x) =
(
n∑
i=1
xpi
)1/p
with convex domain dom(f) = Rn++ (n-dim strictly elementwise positive vectors), and p < 1, p ̸= 0.
1. Evaluate the elementwise second order derivatives ∂
2f
∂xixj
for arbitrary integer i, j ∈ [1, n].
2. Denote the elementwise power of a vector a ∈ Rn++ to a real number t as at =[
a1
t a2
t · · · ant
]T
. Also, the diag(·) function returns the diagonal matrix with diagonal values
input as a vector. Prove that
H = (1− p)f(x)1−2p ·
(
xp−1 · xp−1T − f(x)p · diag (xp−2))
3. Prove H is negative semidefinite, hence f is concave since it has a convex domain.
Exercise 4 Expectations with respect to a Gaussian distribution (10+10+8 credits)
A common objective function in modern machine learning is the variational free-energy,
F(q(θ)) =
∫
dθq(θ) log
q(θ)
p(θ)p(y|θ, x) =
∫
dθq(θ)[log q(θ)− log p(θ)− log p(y|θ, x)]. (1)
Consider a simplified setting in which
p(θ) = N (θ; 0, 1), (2)
p(y|θ, x) = N (y; θx, σ2n), (3)
q(θ) = N (θ;µ, σ2), (4)
where N (x;m, v) means x is a univariate Gaussian random variable with mean m and variance v.
1. Compute F .
2. Find the gradients ∂∂µF and ∂∂σF .
3. Set these gradients to zero and solve for µ and σ in terms of y, x and σn.