STAT 8178/7178 Modern Computational Statistical Methods
Modern Computational Statistical Methods
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Modern Computational Statistical Methods Assignment1
STAT 8178/7178
Instructions:
This assignment covers weeks 1, 2 and 3. Each question is worth 20%.
1. Due on 21th March 2021
2. For all the questions please provide the relevant mathematical derivations, the com-
puter programs (only using R software) and the plots.
3. Please submit on iLearn a single PDF file containing all your work (code, compu-
tations, plots, etc.). Other file formats (e.g. Word, html) will NOT be accepted.
4. Try to use Rmarkdown through Rstudio. But it is not compulsory to use Rmark-
down even if facilitate to reproduce results. Only upload the pdf file.
1 of 3
Modern Computational Statistical Methods Assignment1: Due Week 4, 2021
1. Question 1: Consider scalar data-points x1, . . . , xn and labels y1, . . . , yn. Assume we
are considering a very basic regression model,
y = θx+ ,
where θ is a scalar parameter (this is simple linear regression with no intercept term).
Consider the loss function,
L(θ) =
n∑
i=1
(yi − θxi)2,
and assume we are minimizing the loss with respect to θ via gradient descent of the form,
θk+1 = θk − η∇L(θk), (1)
where η > 0 is the learning rate and ∇L(z) is the gradient at the point z (in this case the
derivative).
(a) 4 marks. Compute ∇L(z) and represent it in terms of Uxx and Uxy where,
Uxx =
n∑
i=1
x2i , and Uxy =
n∑
i=1
xiyi.
(b) 4 marks. Now represent the gradient descent equation (1) as,
θk+1 = aθk + b.
What are a and b?
(c) 4 marks. For what values of η does convergence occur?
(d) 8 marks. Consider the dataset with x values 1, 2, 3, 4, 5 and y values 0.9, 2.1, 3.1, 3.9, 5.1.
Write a short script that finds the best θ using gradient descent and a learning rate
that is 90% of the maximally possible learning rate. Plot your model (line of best fit)
and also plot the trajectory of θk during the learning process.
2 of 3
Modern Computational Statistical Methods Assignment1: Due Week 4, 2021
2. Question 2: Consider the “flower” from the following image.
The data from this image correspond to: two features x1 and x2 and one binary outcome
y (0 for red dots and 1 for blue dots). An extract of the first 6 dots are presenting in the
following table:
.
(a) 2 marks. Load the dataset (available on Ilearn) and split it into training and test sets.
You should have 80% of your data in your train set and the remaining 20% in your
test set. Carry out a statistical comparison of your choice between the distribution
of the two classes in both the training set and the test set, aiming to show that the
sets are randomly chosen.
(b) 3 marks. Fit a logistic model to the training set using a generalized linear model
(using glm function ) to create a binary classifier using the train data.
(c) 3 marks. Evaluate the performance of your classifier on the test set. You should
provide the confusion matrix as well as the F1 score.
(d) 6 marks. Now using first principles (not using any specific packages), build a binary
classifier using the sigmoid function on the linear combination of the features (includ-
ing a bias term). You will estimate your parameter by exploiting the cross-entropy
loss. Remember that it is equivalent to the logistic model. You will use your own
batch gradient descent algorithm for optimizing your cost function. Provide at least
two R functions:
i. A first function for getting the estimates of your model. Some arguments of
your function might be: the initial start values of the parameters, a data matrix
containing features and response variable, the tolerance for your stoping rule, the
maximum number of iterations, the learning rate, ...
ii. A second function for classifying new data points.
(e) 3 marks. Train your model using the training data. Provide a plot of the loss
function during training to illustrate convergence of your model. You might try
different learning rate.
(f) 3 marks. Evaluate the performance of your classifier on the test set. You should
provide the confusion matrix and F1 score and compare with the results of item 3
above.