Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECMM422 Machine Learning
Course Assessment 1
This course assessment (CA1) represents 40% of the overall module assessment.
This is an individual exercise and your attention is drawn to the College and University guidelines on collaboration and plagiarism, which are
available from the College website.
Note:
1. do not change the name of this notebook, i.e. the notebook file has to be named: ca1.ipynb
2. do not remove/delete any cell
3. do not add any cell (you can work on a draft notebook and only copy the function implementations here)
4. do not add you name or student code in the notebook or in the file name
Evaluation criteria:
Each question asks for one or more functions to be implemented.
Each question is awarded a number of marks.
A (hidden) unit test is going to evaluate if all desired properties of the required function(s) are met.
If the test passes all the associated marks are awarded, if it fails 0 marks are awarded. The large number of questions allows a fine grading.
Notes:
In the rest of the notebook, the term data matrix refers to a two dimensional numpy array where instances are encoded as rows, e.g. a
data matrix with 100 rows and 4 columns is to be interpreted as a collection of 100 instances each with four features.
When a required function can be implemented directly by a library function it is intended that the candidate should write her own
implementation of the function, e.g. a function to compute the accuracy or the cross validation.
Some questions are just a check-point, i.e. it is for you to see that you are correctly implementing all functions. Since those check-points use
functions that you have already implemented and that have already been marked, those questions are not going to be marked (i.e. they
appear as having marks 0).
In [ ]: %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
# unit test utilities: you can ignore these function
def is_approximately_equal(test,target,eps=1e-2):
return np.mean(np.fabs(np.array(test) - np.array(target)))
def assert_test_equality(test, target):
assert is_approximately_equal(test, target), 'Expected:\n %s \nbut got:\n %s'%(target, test)
Question 1 [marks 6]
a) Make a function data_matrix = make_data_classification(mean, std, n_centres, inner_std, n_samples,
random_seed=42) to create a data matrix according to the following rules:
1. mean is a n-dimensional vector (say [1,1], but the function should allow vectors of any dimension)
2. n_centres is the number of centres (say 3)
3. std is the standard deviation (say 1)
4. the centres are sampled from a Normal distribution with mean mean and standard deviation std
5. from each centre sample n_samples from a Normal distribution with the centre as the mean and standard deviation inner_std so
if mean=[1,1] n_centres=3 and n_samples=10 then the data matrix will be a 30 rows x 2 columns numpy array.
b) Make a function data_matrix, targets = make_data_regression(mean, std, n_centres, inner_std,
n_samples_list, random_seed=42) to create a data matrix and a target vector according to the following rules:
1. the data matrix is constructed in the same way as in make_data_classification
2. the targets are the Euclidean distance between the sample and the centre of the generating Normal distribution
See Question 3 for a graphical example of the expected output.
In [ ]: def make_data_classification(mean, std, n_centres, inner_std, n_samples, random_seed=42):
# YOUR CODE HERE
raise NotImplementedError()
def make_data_regression(mean, std, n_centres, inner_std, n_samples, random_seed=42):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]: # This cell is reserved for the unit tests. Do not consider this cell.
In [ ]: # This cell is reserved for the unit tests. Do not consider this cell.
Question 2 [marks 2]
a) Make a function data_matrix, targets = get_dataset_classification(n_samples, std, inner_std) to create a data
matrix and a target vector for a binary classification problem according to the following rules:
the instances from the positive class are generated according to the same rules provided for make_data_classification ; so are
the instances from the negative class
instances from the positive class have as mean the vector [10,10] and those from the negative class, vector [-10,-10]
the number of centres is fixed to 3
the random seed is fixed to 42
n_samples indicates the total number of instances finally available in the output data_matrix
b) Make a function data_matrix, targets = get_dataset_regression(n_samples, std, inner_std) to create a data
matrix according to the following rules:
the instances are generated according to the same rules provided for make_data_regression
the targets are generated according to the same rules provided for make_data_regression
instances have as mean the vector [10,10]
the number of centres is fixed to 3
the random seed is fixed to 42
n_samples indicates the total number of instances finally available in the output data_matrix
In [ ]: def get_dataset_classification(n_samples, std, inner_std):
# YOUR CODE HERE
raise NotImplementedError()
def get_dataset_regression(n_samples, std, inner_std):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]: # This cell is reserved for the unit tests. Do not consider this cell.
Question 3 [marks 1]
Make a function plot(X,y) to display the scatter plot of a data matrix of two dimensional instances using the array y to assign the
colour to the instances.
When running
X, y = get_dataset_regression(n_samples=600, std=30, inner_std=5)
plot(X,y)
you should get something like
and when running
X, y = get_dataset_classification(n_samples=600, std=30, inner_std=5)
plot(X,y)
you should get something like
In [ ]: def plot(X,y):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]: # This cell is reserved for the unit tests. Do not consider this cell.
Question 4 [marks 1]
Make a function classification_error(targets, preds) to compute the fraction of times that the entries in targets do not
agree with the corresponding entries in preds .
Note: do not use library functions to compute the result directly but implement your own version.
In [ ]: def classification_error(targets, preds):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]: # This cell is reserved for the unit tests. Do not consider this cell.
Question 5 [marks 2]
Make a function regression_error(targets, preds) to compute the mean squared error between targets and preds .
Note: do not use library functions to compute the result directly but implement your own version.
MSE = ( − .