Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP61021: Representation Learning
To carry out this assessed coursework, you will need the Python notebook file, manifold.ipynb,
which contain code to get you started with each of the assignments and the data files: bars.npz
for the bar images and face_tenenbaum.npz for the face images. Other datasets (e.g. ten_city)
are generated by using built-in functions in the notebook. All above are available in a zipped file
on BlackBoard alongside this document.
Caveat: You may not display some resultant figures correctly with %matplotlib notebook which
appears in the first line of the given Jupyter notebook file, especially on MS Windows. To deal with
this issue, you need to use the Firefox browser instead of Chrome, Microso IE and Edge.
Manifold learning is one of the central themes in representation learning. In this coursework, you
are asked to use Python to implement several manifold learning algorithms learnt from this course
and apply your own implementation to synthetic and real datasets for manifold learning.
Supportive Soware
To do this coursework, you are provided with our own Python implementation of LLE and visuali-
sation tools required by this coursework.
Our implementation in Python includes: optimization functions in optimization.py, locally lin-
ear embedding inlle.py, display functions inhelpers.pyand dataset functions indataset.py.
optimization.py enables you to carry out the gradient descent method required for the assign-
ments regarding the stress-based MDS algorithms.
To solve an optimization problem by using the gradient descent method, the signature of this func-
tion is as follows:
gradient_descent(D, x0, loss_f, grad_f, lr, tol, max_iter)
where D is a distance matrix between points in the original space and x0 is an initial value we want
to get. loss_f and grad_f are the loss function and its first derivative or gradient, respectively. lr
and tol are learning rate and tolerance for early stopping. max_iter is the maximum number of
iteration.
In lle.py, the signature of lle function is as follows:
lle(data, n_components=2, n_neighbors=None, epsilon=None, dist_func=None,
reg_func=None)
where data contains data points in the original space, n_components is the dimensionality of tar-
get embedded space, n_neighbors is the number of neighbours for KNN, epsilon is the value of
*Assessed Coursework: the deadline and requirements can be found at the end of this document.
Page 1
fixed radius for -distance, dist_func is the function to find out neighbours and their distances,
and reg_func implements a regularization term to avoid the singular case.
Below, you can see two examples on how to use thelle function with dierent neighbourhoods.
from Code . l l e import l l e
data = . . . # data po i n t i n the o r i g i n a l space
n_dim = 2
k = 7
Y = l l e ( data , n_components =n_dim , n _ n e i g h b o r s =k ,
d i s t _ f u n c = n e a r e s t _ n e i g h b o r _ d i s t a n c e , r e g _ f u n c = r e g _ f u n c )
from Code . l l e import l l e
data = . . . # data po i n t i n the o r i g i n a l space
n_dim = 2
e = 5 . 3
Y = l l e ( data , n_components =n_dim , e p s i l o n =e ,
d i s t _ f u n c = f i x e d _ r a d i u s _ d i s t a n c e , r e g _ f u n c = r e g _ f u n c )
where KNN is used in the first case and -NN is used in the second case. For the reg_func ar-
gument, the None value can be set except for Assignment 7 where the function reg_func(C, K)
provided in manifold.ipynb must be used.
helpers.pyprovides the useful functions to enable you to visualise the results of each assignment.
This Python module containsVIS_Shortest_path_2dandImageViewer, VIS_Bars classes.
VIS_Shortest_path_2d is used to display not only the embedded points in latent space but also
the shortest path between two specified data points. Its constructor is as follows:
VIS_Shortest_path_2d(proj, dist, predecessors, fig_vis)
where proj refers to the embedded points in the target space, dist and predecessors are dis-
tance matrix and the index of predecessors for the shortest path which are obtained from isomap
function. fig_vis is figure object created by the Python built-in function figure() in the mat-
plotlib package. The usage of this class is as follows:
import m a t p l o t l i b . p y p l o t as p l t
from Code . isomap import isomap , d i s t _ n e a r e s t _ n e i g h b o r
p o i n t s = . . . # data po i n t i n column v e c t o r
n_components = 2
n _ n e i g h b o r s = 6
Y , d i s t , p r e d e c e s s o r s = isomap ( . . . )
f i g = p l t . f i g u r e ( )
V I S _ S h o r t e s t _ p a t h _ 2 d ( Y , d i s t , p r e d e c e s s o r s , f i g )
By using the VIS_Shortest_path_2d, the embedded points in latent space will be plotted. Also,
you can see the shortest path between two selected points.
ImageViewer is used to display images on the shortest path and its constructor is as follows:
ImageViewer(data, index, image_size, fig_vis, max_row=5)
where data refers to data points in the source space, index refers to the indices of points on the
shortest path, image_size is the size of images used in data, fig_vis is a figure object created
by Python built-in function, figure(), in the matplotlib package and max_row is the maximum
number of images in a row.