Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
This HW includes both theory and implementation problems. Please note,
The submission for this homework should be a single PDF file via Gradescope and a single ‘zip‘ file containing all of the relevant code, figures, and any text explaining your results via Canvas.
Kernel Methods
Problem 1 (10 points) In this problem you are asked to find the explicit mapping function Φ(·) for the Gaussian kernel k (x, x′) = exp ?− ∥x − x′∥2 /2γ2? by showing that it can be expressed as the inner product of an infinite-dimensional feature space. Assume x, x ∈ Rd and ∥x∥2 = ∥x′∥2 = 1.
Problem 2 (20 points) In the lectures we used the KKT optimally conditions to drive the dual optimization problem for hard SVM. In this problem you are asked to do the same for soft SVM. Consider the constrained optimization problem for soft SVM:
1 2 ?n min ∥w∥ +C ξi
w,b,ξ1,ξ2,…,ξn 2 i=1
subjecttoyi?w⊤xi +b?≥1−ξi,i=1,2,…,n ξi ≥ 0,i = 1,2,…,n
a) Consider two sets of dual variables: α1, . . . , αn for first set of constraints, i.e., yi ?w⊤xi + b? ≥ 1 − ξi and another set of dual variables β1, . . . , βn for second set of constrains ξi ≥ 0 and write down the Lagrangian functions.
b) For the Lagrangian function in part (a), write down the KKT optimally conditions. c) Derive the dual problem.
1
Boosting
Problem 3 (15 points) Consider the AdaBoost algorithm we discussed in the class 1. Ad- aBoost is an example of ensemble classifiers where the weights in next round are decided based on the training error of the weak classifier learned on the current weighted training set. We wish to run the AdaBoost on the dataset provided in Table 1.
Instance Color Size Shape Edible? D1 Yellow Small Round Yes
D2 |
Yellow |
Small |
Round |
No |
D3 |
Green |
Small |
Irregular |
Yes |
D4 |
Green |
Large |
Irregular |
No |
D5 |
Yellow |
Large |
Round |
Yes |
D6 |
Yellow |
Small |
Round |
Yes |
D7 |
Yellow |
Small |
Round |
Yes |
D8 |
Yellow |
Small |
Round |
Yes |
D9 |
Green |
Small |
Round |
No |
D10 |
Yellow |
Large |
Round |
No |
D11 |
Yellow |
Large |
Round |
Yes |
D12 |
Yellow |
Large |
Round |
No |
D13 |
Yellow |
Large |
Round |
No |
D14 |
Yellow |
Large |
Round |
No |
D15 |
Yellow |
Small |
Irregular |
Yes |
D16 |
Yellow |
Large |
Irregular |
Yes |
Table 1: Mushroom data with 16 instances, three categorical features, and binary labels.
a) Assume we choose the following decision stump f1 (a shallow tree with a single decision node), as the first predictor (i.e., when training instances are weighted uniformly):
if(Color is Yellow): predict Eadible = Yes
else: predict Eadible = No
What would be the weight of f1 in final ensemble classifier (i.e., α1 in f(x) = ?Ki=1 αifi(x))?
b) After computing f1, we proceed to next round of AdaBoost. We begin by recomputing data weights depending on the error of f1 and whether a point was (mis)classified by f1. What is the weight of each instance in second boosting iteration, i.e., after the points have been re-weighted? Please note that the weights across the training set are to be uniformly initialized.
c) In AdaBoost, would you stop the iteration if the error rate of the current weak classifier on the weighted training data is 0?
1Please read reading assignement from textbook and (optional) the Introduction chapter from Boosting book for detailed explanation https://bit.ly/2HvKObl
2
Experiment with non-linear classifiers
Problem 4 (40 points) For this problem, you will need to learn to use software libraries for
at least two of the following non-linear classifier types:
• Boosted Decision Trees (i.e., boosting with decision trees as weak learner)
• Random Forests
• Support Vector Machines with Gaussian Kernel
All of these are available in scikit-learn, although you may also use other external libraries (e.g., XGBoost 2 for boosted decision trees and LibSVM for SVMs). You are welcome to implement learning algorithms for these classifiers yourself, but this is neither required nor recommended.
Pick two different types of non-linear classifiers from above for classification of Adult dataset. You can the download the data from a9a in libSVM data repository. The a9a data set comes with two files: the training data file a9a with 32,561 samples each with 123 features, and a9a.t with 16,281 test samples. Note that a9a data is in LibSVM for- mat. In this format, each line takes the form <label> <feature-id>:<feature-value> <feature-id>:<feature-value> ….. This format is especially suitable for sparse datasets. Note that scikit-learn includes utility functions (e.g., load svmlight file in example code below) for loading datasets in the LibSVM format.
For each of learning algorithms, you will need to set various hyperparameters (e.g., the type of kernel and regularization parameter for SVM, tree method, max depth, number of weak classifiers, etc for XGBoost, number of estimators and min impurity decrease for Random Forests). Often there are defaults that make a good starting point, but you may need to adjust at least some of them to get good performance. Use hold-out validation or K-fold cross-validation to do this (scikit-learn has nice features to accomplish this, e.g., you may use train test split to split data into train and test data and sklearn.model selection for K-fold cross validation). Do not make any hyperparameter choices (or any other similar choices) based on the test set! You should only compute the test error rates after you have settled on hyperparameter settings and trained your two final classifiers.