Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
FTEC2101/ESTR2520 Optimization Methods
Project Specification – Binary Classification
Thus far we have learnt a number of optimization methods, ranging from the simplex method for linear
programming, modeling techniques for integer programming, gradient/Newton methods for unconstrained
optimization, KKT conditions, SOCP formulation, etc.. Besides theories, please remember that optimization
methods are practical tools for solving real world problems. The aim of this project is to practice our skill on
the latter aspect. We will make use of the Julia1 environment.
The project’s theme is about practical solutions for binary classification. It can be considered as an extension to
what we have experimented in Computing Lab 2. We will explore additional aspects about binary classification
and more importantly, implement and compare these ideas on a real-world dataset.
Background. Let us recall that the binary classification task aims at learning a linear classifier that distin-
guishes a feature vector as positive (+1) or negative (−1). Let d ∈ N be the dimension of the feature vector,
the (linear) classifier is described by the tuple (w, b) where w ∈ Rd is the direction parameters and b ∈ R is
the bias parameter2, both specifying the linear model. Given (w, b), a feature vector x ∈ Rd is classified into
a label y ∈ {±1} such that
y =
{
+1, if w>x+ b = w1x1 + · · ·+ wdxd + b ≥ 0,
−1, if w>x+ b = w1x1 + · · ·+ wdxd + b < 0.
(1.1)
As an illustration, the following figure shows a case with d = 2 such that the classifier is described by
w = (w1, w2) and the scalar b:
I
"
\ ✗ ✗ ✗
ii.
+
+
✗
✗
1-
✗
\ ✗
y l
l
✗ X '
e Wixitwzlz -112--0
y ✗ ,
i
,
2×1
According to the rule in (1.1), all the feature vectors that lie above the dashed line shall be classified as +1
(blue points); those that lie below the dashed line shall be classified as −1 (red points).
1As an alternative, you are welcomed to use Python with optimization modeling packages supporting SOCP and MI-NLP such
as cvxpy. However, you have to notify the instructor about the choice and the plan on how you wish to accomplish the project
in the latter case on or before May 1, 2022, i.e., two weeks before the deadline. In the project, you are not allowed to use any
pre-built solver for binary classification such as MLJ.jl, ScikitLearn.jl, flux.jl scikit-learn as your final submission (though
you are encouraged to try out these packages as extra work to support your solution).
2Note that this differs slightly from Lecture 16 as we include the bias parameter in the classifier design.
1
FTEC2101/ESTR2520 Project 2
Dataset & Folder Setting We have m ∈ N samples of training data described by {x(i), y(i)}mi=1, where
x(i) ∈ Rd is the ith feature vector and y(i) ∈ {±1} is the associated label. We will use a curated version
of the GiveMeSomeCredits dataset taken from https://www.kaggle.com/c/GiveMeSomeCredit. It includes
customer data such as their debt ratios, credit utilization, age, etc., and the history of defaulting. With these
information, our goal is to learn a classifier that predicts if a customer has any risk of defaulting before the
bank approves loan(s) for him/her.
To prepare for the project, please retrieve the Jupyter notebook template ftec2101-project-template-2022.ipynb
and the zip archive ftec project files.zip from the Blackboard. Place the *.ipynb files in a working direc-
tory, extract the *.zip file and move the *.csv files inside the same working directory. Your working directory
should have the following content:
Notice that:
• ftec-cs-groupi-train.csv – the training dataset for students in group i, i = 1, 2, 3, 4, 5. This is a csv
file that contains 20 samples of customer data to be used in the Compulsory Tasks.
• ftec-cs-groupi-test.csv – the testing dataset for students in group i, i = 1, 2, 3, 4, 5. This is a csv
file that contains 30 samples of customer data to be used in the Compulsory Tasks.
• ftec-cs-full-train.csv – the training dataset that contains 16657 samples of customer data to be
used in the Competitive Tasks.
• ftec-cs-full-test.csv – the training dataset that contains 34870 samples of customer data to be used
in the Competitive Tasks.
Lastly, the Jupyter notebook ftec2101 project 2022.ipynb provided detailed descriptions and helper codes
to guide you through the tasks required by this project. Please pay attention to the comments provided inside
the code cells of the Jupyter notebook as well.
1.1 Compulsory Tasks (50%)
The compulsory tasks of this project is divided into two parts. You are required to
(A) answer questions related to the optimization theory and modeling related to binary classification; and
(B) implement the modeled optimization problems on computer and solve them with the GiveMeSomeCredits
dataset.
FTEC2101/ESTR2520 Project 3
Theory and Modeling Denote the training dataset withm samples as {x(i), y(i)}mi=1, where x(i) ∈ Rd
is the ith feature vector and y(i) ∈ {±1} is the associated label. Let λ > 0 be a fixed scalar. The following
optimization problem designs what is known as the hard-margin classifier [Sa, 2022]:
min
w∈Rd,b∈R
1
2
w>w
s.t. y(i)
(
(x(i))>w + b
) ≥ 1, i = 1, . . . ,m. (1.2)
It can be easily shown that (1.2) is a convex optimization problem.
Task 1: Hard-margin Formulation (7.5%)
Answer the following question:
(a) Explain how solving (1.2) finds a classifier (w?, b?) that can correctly distinguish the m training
samples into the +1 or −1 labels. (Hint: focus on the constraints in (1.2) and compare it to (1.1)
as well as the figure that explains it.)
(b) Derive the KKT condition for (1.2). Using the KKT condition, show that the optimal w? is a linear
combination of x(i) for the samples i such that 1 = y(i)((x(i))>w? + b?).
(c) Give an example of training dataset with d = 2 where (1.2) is infeasible. You may describe such
dataset by drawing it on a 2-D plane.
Due to the issue of infeasibility as investigated in part (c) of Task 1, the hard-margin formulation (1.2) is
seldom used in practice. Instead, we consider a soft-margin formulation, which introduces a non-negative
‘auxiliary decision variable’3 ξ(i) to each of the constraint, yielding:
y(i)
(
(x(i))>w + b
) ≥ 1− ξi, ξi ≥ 0, i = 1, . . . ,m. (1.3)
Instead of forcing y(i)
(
(x(i))>w + b
) ≥ 1 for all training samples, i.e., requiring all the training samples are
correctly classified, the above constraint with the auxiliary variable ξi allows for erroneous classification.
The auxiliary variable ξi is interpreted as the amount of error for the ith sample. Notice that ξi > 0 if and
only if the ith training sample is mis-classified for the sample, i.e., y(i)
(
(x(i))>w + b
)
1. Ideally, we should
find (w, b) such that the number of samples i with ξi > 0 is minimized.
Below, you will formulate a mixed integer program (in fact, a mixed-integer non-linear program, MI-NLP)
optimization problem for the soft constraint optimization.
Task 2: Mixed Integer Formulation (5%)
Answer the following question: Based on the soft-margin constraint (1.3) [as well as the max-margin
problem (1.2)], formulate an optimization problem with the following specification:
• The objective is to minimize the total number of auxiliary decision variables ξi with ξi > 0
(i.e., the total number of errors).
• The amount of error is bounded such that
0 ≤ ξi ≤M, i = 1, ...,m,
3Note that it bears some resemblance to the artificial variable technique we learnt in the Big-M simplex method.
FTEC2101/ESTR2520 Project 4
where M is a known/given constant.
• The soft-margin constraints (1.3) are satisfied.
• The directional parameters w ∈ Rd satisfies the following shaping constraint:
w>Σw + c>w ≤ 1,
where Σ ∈ Rd×d is a given symmetric, positive definite matrix, and c ∈ Rd is a given vector.