GR5241 Statistical Machine Learning
Statistical Machine Learning
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
GR5241 Statistical Machine Learning
Homework submission: Please submit your homework electronically through Canvas by 11:59pm on the due date. You need to submit both the pdf file and your code (either in R or Python). Problem 1 (Boosting, 30 points) The objective of this problem is to implement the AdaBoost algorithm. We will test the algorithm on handwritten digits from the USPS data set. AdaBoost: Assume we are given a training sample (x (i) , yi), i = 1, ..., n, where x (i) are data values in R d and yi ∈ {−1, +1} are class labels. Along with the training data, we provide the algorithm with a training routine for some classifier c (the “weak learner”). Here is the AdaBoost algorithm for the two-class problem: 1. Initialize weights: wi = n/1 2. for b = 1, ..., B (a) Train a weak learner cb on the weighted training data.
(b) Compute error:
(c) Compute voting weights:
(d) Recompute weights:
3. Return classifier Decision stumps: Recall that a stump classifier c is defined by Since the stump ignores all entries of x except xj , it is equivalent to a linear classifier defined by an affine hyperplane. The plane is orthogonal to the jth axis, with which it intersects at xj = θ. The orientation of the hyperplane is determined by m ∈ {−1, +1}. We will employ stumps as weak learners in our boosting algorithm. To train stumps on weighted data, use the learning rule
In the implementation of your training routine, first determine an optimal parameter θj ∗ for each dimension j = 1, ..., d, and then select the j ∗ for which the cost term in (2) is minimal. Homework problems: 1.i (8) Implement the AdaBoost algorithm in R. The algorithm requires two auxiliary functions, to train and evaluate the weak learner. We also need a third function which implements the resulting boosting classifier. We will use decision stumps as weak learners, but a good implementation of the boosting algorithm should permit you to easily plug in arbitrary weak learners. To make sure that is possible, please use function calls of the following form.
• pars <- train(X, w, y) for the weak learner training routine, where X is a matrix the columns of which are the training vectors x (1) , . . . , x (n) , and w and y are vectors containing the weights and class labels. The output pars is a list which contains the parameters specifying the resulting classifier. (In our case, pars will be the triplet (j, θ, m) which specifies the decision stump). • label <- classify(X, pars) for the classification routine, which evaluates the weak learner on X using the parametrization pars. • A function c hat <- agg class(X, alpha, allPars) which evaluates the boosting classifier (“ag gregated classifier”) on X. The argument alpha denotes the vector of voting weights and allPars contains the parameters of all weak learners. 1.ii (6) Implement the functions train and classify for decision stumps. 1.iii (12) Run your algorithm on the USPS data (the digit data we used in Homework 4, use the training and test data for the 5s and 6s) and evaluate your results using cross validation. Your AdaBoost algorithm returns a classifier that is a combination of B weak learners. Since it is an incremental algorithm, we can evaluate the AdaBoost at every iteration b by considering the sum up to the b-th weak learner. At each iteration, perform. 5-fold cross validation to estimate the training and test error of the current classifier (that is, the errors measured on the cross validation training and test sets, respectively). 1.iv (4) Plot the training error and the test error as a function of b. Submission. Please make sure your solution contains the following: • Your implementation for train, classify and agg class. • Your implementation of AdaBoost. • Plots of your results (training error and cross-validated test error). Problem 2 (Hierachical clustering, 5 points) Perform. single-linkage hierarchical clustering on the following 2-dimensional observations. Use the Manhattan distance between a pair of points: d(A, B) = |X1A − X1B| + |X2A − X2B|. Show your results after each step.