CSE474/574 Introduction to Machine Learning
Introduction to Machine Learning
CSE474/574 Introduction to Machine Learning
Programming Assignment 1
Linear Models for Supervised Learning
Maximum Score: 100
Note A zipped file containing skeleton Python script files and data is provided. Note that for each problem,
you need to write code in the specified function withing the Python script file. Do not use any Python
libraries/toolboxes, built-in functions, or external tools/libraries that directly perform clas-
sification, regression, function fitting, etc.. Using any ML library calls, such as scikit-learn, will
result in 0 points for the corresponding problem.
Evaluation We will evaluate your code by executing PA1Script.ipynb file, which will internally call the
problem specific functions. Also submit an assignment report (pdf file) summarizing your findings. In the
problem statements below, the portions under REPORT heading need to be discussed in the assignment
report.
Data Sets Two data sets are provided:
1. A medical data set is provided in the file “diabetes.pickle” along with the target assignment. The
64 input variables correspond to measurements (physical, physiological, and blood related) for a given
patient and the target variable corresponds to the level of diabetic condition in the patient. It contains:
• xtrain (242× 64) and ytrain (242× 1) for training.
• xtest (200× 64) and ytest (200× 1) for testing.
2. A 2D sample data set in the file “sample.pickle”.
Submission You are required to submit a single file called pa1.zip using UBLearns.
File pa1.zip must contain 2 files: report.pdf and PA1Script.ipynb.
• Submit your report in a pdf format. Please indicate the team members, group number, and your
course number on the top of the report.
• The code file should contain all implemented functions. Please do not change the name of the file.
Please make sure that your group is enrolled in the UBLearns system: You should submit
one solution per group through the groups page. If you want to change the group, contact the instructors.
Project report: The report should be submitted as a single pdf file, along with the code, through UBLearns.
The problem descriptions specify what is needed in the report for each problem.
1
Part I - Linear Regression
In this part you will implement the direct and gradient descent based learning methods for Linear Regression
and compare the results on the provided “diabetes” dataset.
Problem 1: Linear Regression with Direct Minimization (5 code + 5 report =
10 Points)
Implement ordinary least squares method to estimate regression parameters by minimizing the squared loss.
J(w) =
1
2
N∑
i=1
(yi −w>xi)2 (1)
In matrix-vector notation, the loss function can be written as:
J(w) =
1
2
(y −Xw)>(y −Xw) (2)
where X is the input data matrix, y is the target vector, and w is the weight vector for regression.
You need to implement the function learnOLERegression. Also implement the function testOLERegression
to apply the learnt weights for prediction on both training and testing data and to calculate the root mean
squared error (RMSE):
RMSE =
√√√√ 1
N
N∑
i=1
(yi −w>xi)2 (3)
REPORT 1.
Calculate and report the RMSE for training and test data for two cases: first, without using an intercept
(or bias) term, and second with using an intercept. Which one is better?
Problem 2: Using Gradient Descent for Linear Regression Learning (10 code +
5 report = 15 Points)
As discussed in class, regression parameters can be calculated directly using analytical expressions (as in
Problem 1). However, to avoid computation of (X>X)−1, another option is to use gradient descent to
minimize the loss function. In this problem, you have to implement the gradient descent procedure for
estimating the weights w, where the gradient is given by:
∇J(w) = X>Xw −X>y (4)
You need to use the minimize function (from the scipy library). You need to implement a function
regressionObjVal to compute the squared error (See (2)) and a function regressionGradient to com-
pute its gradient with respect to w. In the main script, this objective function and the gradient function will
be used within the minimizer
REPORT 2.
Using testOLERegression, calculate and report the RMSE for training and test data after gradient descent
based learning. Compare with the RMSE after direct minimization. Which one is better?
Part II - Linear Classifiers
In this part you will implement three different linear classifiers using different optimization algorithms and
compare the results on the provided data set. You will also have to draw the discrimination boundaries for
the three classifiers and compare.
The three classifiers are:
2
1. Perceptron
2. Logistic Regression
3. Linear Support Vector Machine (SVM)
For each classifier, the decision rule is the same, i.e., the target, yi, for a given input, xi is given by:
yi =
{ −1 if w>xi < 0
+1 if w>xi ≥ 0 (5)
where w are the weights representing to the linear discriminating boundary. We will assume that we have
included a constant term in xi and a corresponding weight in w. While all three classifiers have the same
decision function1
For this part, you will implement the training algorithms for the three different linear classifiers, learn a
model for the sample training data and report the accuracy on the sample training and test data sets. The
sample training and test data sets are included in the “sample.pickle” file.
Problem 3: Using Gradient Descent for Perceptron Learning (10 code + 5 report
= 15 points)
For this problem, you will training a perceptron, which has a squared loss function which is exactly the same
as linear regression (See (1)), i.e.,
J(w) =
1
2
N∑
i=1
(yi −w>xi)2 (8)
which means that you can call the same functions, regressionObjVal and regressionGradient, imple-
mented in Problem 2, to train the perceptron.
Implement two functions:
1. a testing function, predictLinearModel that returns the predictions of a model on a test data set
2. an evaluation function, evaluateLinearModel, that computes the accuracy of the model on the test
data by calculating the fraction of observations for which the predicted label is same as the true label.
REPORT 3.
Train the perceptron model by calling the scipy.optimize.minimize method and use the evaluateLinearModel
to calculate and report the accuracy for the training and test data.
Problem 4: Using Newton’s Method for Logistic Regression Learning (15 code
+ 5 report = 20 points)
For this problem, you will train a logistic regression model, whose loss function (also known as the logistic-loss
or log-loss) is given by:
J(w) =
1
n
n∑
i=1
log (1 + exp (−yiw>xi)) (9)
1For Logistic Regression, typically a different formulation is presented. The decision rule is written as:
yi =
{ −1 if θi < 0.5
+1 if θi ≥ 0.5 (6)
where,
θi = σ(w
>xi) =
1
1 + exp−(w>xi)
(7)
However, one can see that it is equivalent to checking if w>xi < 0 or not.
3
The gradient for this loss function is given by, as derived in the class:
∇J(w) = − 1
n
n∑
i=1
yi
1 + exp (yiw>xi)
xi (10)
The Hessian for the loss function is given by:
H(w) =
1
n
n∑
i=1
exp (yiw
>xi)
(1 + exp (yiw>xi))2
xix
>
i (11)
Newton’s Method The update rule is given by:
w(t) = w(t−1) + ηH−1(w(t−1))∇J(w(t−1))
However, for this assignment we will be using the scipy.optimize.minimize function again, with method
= ’Newton-CG’, for training using the Newton’s method. This will need you to implement the following three
functions:
1. logisticObjVal - compute the logistic loss for the given data set (See (9)).
2. logisticGradient - compute the gradient vector of logistic loss for the given data set (See (10)).
3. logisticHessian - compute the Hessian matrix of logistic loss for the given data set (See (11)).
REPORT 4.
Train the logistic regression model by calling the scipy.optimize.minimize method, and use the evaluateLinearModel
to calculate and report the accuracy for the training and test data.
Problem 5: Using Stochastic Gradient Descent Method for Training Linear Sup-
port Vector Machine (10 code + 5 report = 15 points)
While we will study the quadratic optimization formulation for SVMs in class, we can also train the SVM
directly using the hinge-loss given by:
J(w) =
n∑
i=1
max (0, 1− yiw>xi) (12)
Clearly, the above function is not as easily differentiable as the squared-loss and logistic-loss functions above,
we can devise a simple Stochastic Gradient Descent (SGD) based method for learning w. Note that, for a
single observation, the loss is given by:
Ji(w) = max (0, 1− yiw>xi) (13)
=
{
1− yiw>xi if yiw>xi < 1
0 otherwise
(14)
Thus, the gradient of Ji(w) can be written as:
∇Ji(w) =
{ −yixi if yiw>xi < 1
0 otherwise
(15)
The training can be done using the following algorithm:
1: w← [0, 0, . . . , 0]>
2: for t=1, 2, . . . T do
4
3: i← RandomSample(1 . . . n)
4: if yiw
>xi < 1 then
5: w← w + ηyixi
6: end if
7: end for
You have to implement a function trainSGDSVM that learns the optimal weight, w using the above algorithm.
REPORT 5.
Train the SVM model by calling the trainSGDSVM method for 200 iterations (set learning rate parameter η
to 0.01). Use the evaluateLinearModel to calculate and report the accuracy for the training and test data.
Problem 6: Comparing Linear Classifiers (20 code + 5 report = 25 points)
Using the results for Problems 3–5, provide a comparison of the three different linear models (Perceptrons,
Logistic Regression, and Support Vector Machines) on the provided data set.
REPORT 6.
1. Use the results for test data to determine which classifier is the most accurate?
2. Plot the decision boundaries learnt by each classifier using the provided plotDecisionBoundary func-
tion which takes the learnt weight vector, w as one of the parameters. Study the three boundaries and
provide your insights.