MA 325 Mathematics and Applications
Mathematics and Applications
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
North Carolina State University Department of Mathematics
Module 3: Machine Learning: Mathematics and Applications
MA 325 k-NN and Perceptron
1. This is a simple data set to illustrate the Perceptron Learning Model. Consider the following
training data with two categories (labels):
C1(1) : (0, 1.5)
T (1, 1)T (2, 2)T (2, 0)T
C2(−1) : (0, 0)T (1, 0)T (0, 1)T
That is, there are seven training data points, each data point has two features and a corre-
sponding label, +1 or -1.
(a) Plot these seven training data points and observe that they are separable.
(b) Starting with the weights w = (−2, 4, 1)T , where w0 = −2 is the threshold with corre-
sponding artificial coordinate x0 = 1. Plot the linear model w
Tx = 0. Note that, this
linear model doesn’t separate all data points.
(c) Using the Perceptron Learning Algorithm, update the weights so that the linear model
will eventually separate all seven data points. At each iteration, plot the linear model on
the same plot with the data points.
Please show all performed calculations at each iteration of the perceptron learning algorithm
as well as the plots.
2. The MATLAB file, SampleCredit.mat, is the data set for the credit card applications. The pdf
file, CreditApproval.pdf, has a description for this data set. It is noted the values in the data
set was changed by the authors to protect the confidentiality of the data. Use the first 500
data for the training set and the rest for the testing/validation set.
(a) Write your own code to implement the k-NN algorithm for this data set. You can choose
one of the 3 distance metrics from the lecture to compute the distance. Use k as an odd
number that is close to the value of the
√
n, where n is the number of data in the training
data set, which is 500.
(b) For the data in the testing/validation set, calculate the algorithm accuracy. The accuracy
is defined to be the ratio of the number of data that was classified correctly by the k-NN
algorithn divided by the total number of data (in the testing data set). This is possible
since we know the true labels of the testing/validation set.