Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMS 4771
This homework is to be done alone. No late homeworks are allowed. To receive credit, a type-
setted copy of the homework pdf must be uploaded to Gradescope by the due date. You must show
your work to receive full credit. Discussing possible approaches for solutions for homework ques-
tions is encouraged on the course discussion board and with your peers, but you must write your
own individual solutions and not share your written work/code. You must cite all resources (includ-
ing online material, books, articles, help taken from/given to specific individuals, etc.) you used to
complete your work.
1 Inconsistency of the fairness definitions
Recall the notation and definitions of group-based fairness conditions:
Notation:
Denote X ∈ Rd, A ∈ {0, 1} and Y ∈ {0, 1} to be three random variables: non-sensitive fea-
tures of an instance, the instance’s sensitive feature and the target label of the instance respectively,
such that (X,A, Y ) ∼ D. Denote a classifier f : Rd → {0, 1} and denote Yˆ := f(X).
For simplicity, we also use the following abbreviations:
P := P(X,A,Y )∼D and Pa := P(X,a,Y )∼D
Group based fairness definitions:
- Demographic Parity (DP)
P0[Yˆ = yˆ] = P1[Yˆ = yˆ] ∀yˆ ∈ {0, 1}
(equal positive rate across the sensitive attribute)
- Equalized Odds (EO)
P0[Yˆ = yˆ | Y = y] = P1[Yˆ = yˆ | Y = y] ∀yˆ, y ∈ {0, 1}
(equal true positive- and true negative-rates across the sensitive attribute)
- Predictive Parity (PP)
P0[Y = y | Yˆ = yˆ] = P1[Y = y | Yˆ = yˆ] ∀yˆ, y ∈ {0, 1}
(equal positive predictive- and negative predictive-value across the sensitive attribute)
Unfortunately, achieving all three fairness conditions simultaneously is not possible. An impos-
sibility theorem for group-based fairness is stated as follows.
1
• If A is dependent on Y , then Demographic Parity and Predictive Parity cannot hold at the
same time.
• If A is dependent on Y and Yˆ is dependent on Y , then Demographic Parity and Equalized
Odds cannot hold at the same time.
• If A is dependent on Y , then Equalized Odds holds and Predictive Parity cannot hold at the
same time.
These three results collectively show that it is impossible to simultaneously satisfy the fairness defi-
nitions except in some trivial cases.
(i) State a scenario where all three fairness definitions are satisfied simultaneously.
(ii) Prove the first statement.
(iii) Prove the second statement.
(iv) Prove the third statement.
Hint: First observe that
P0[Y = y|Yˆ = yˆ] = P1[Y = y|Yˆ = yˆ] ∀yˆ, y ∈ {0, 1}
is equivalent to:
P0[Y = 1|Yˆ = yˆ] = P1[Y = 1|Yˆ = yˆ] ∀yˆ ∈ {0, 1}.
A necessary condition for PP is the equality of positive predictive value (PPV):
P0[Y = 1|Yˆ = 1] = P1[Y = 1|Yˆ = 1]
To prove the third statement, it is enough to prove a stronger statement: if A is dependent on
Y , Equalized Odds and equality of Positive Predictive Value cannot hold at the same time.
Next, try to express the relationship between FPRa (= Pa[Yˆ = 1|Y = 0]) and FNRa (=
Pa[Yˆ = 0|Y = 1]) using pa (= P[Y = 1 | A = a]) and PPVa (= Pa[Y = 1|Yˆ = 1]),
∀a ∈ {0, 1} and finish the proof.
2 Combining multiple classifiers
The concept of “wisdom-of-the-crowd” posits that collective knowledge of a group as expressed
through their aggregated actions or opinions is superior to the decision of any one individual in the
group. Here we will study a version of the “wisdom-of-the-crowd” for binary classifiers: how can
one combine prediction outputs from multiple possibly low-quality binary classifiers to achieve an
aggregate high-quality final output? Consider the following iterative procedure to combine classifier
results.
2
Input:
- S – a set of training samples: S = {(x1, y1), . . . , (xm, ym)}, where each yi ∈ {−1,+1}
- T – number of iterations (also, number of classifiers to combine)
- F – a set of (possibly low-quality) classifiers. Each f ∈ F , is of the form f : X → {−1,+1}
Output:
- F – a set of selected classifiers {f1, . . . , fT }, where each fi ∈ F .
- A – a set of combination weights {α1, . . . , αT }
Iterative Combination Procedure:
- Initialize distribution weights D1(i) = 1m [for i = 1, . . . ,m]
- for t = 1, . . . , T do
- // ϵj is weighted error of j-th classifier w.r.t. Dt
- Define ϵj :=
∑m
i=1Dt(i) · 1[yi ̸= fj(xi)] [for each fj ∈ F]
- // select the classifier with the smallest (weighted) error
- ft = argminfj∈F ϵj
- ϵt = minfj∈F ϵj
- // recompute weights w.r.t. performance of ft
- Compute classifier weight αt = 12 ln
(
1−ϵt
ϵt
)
- Compute distribution weight Dt+1(i) = Dt(i) exp(−αtyift(xi))
- Normalize distribution weights Dt+1(i) =
Dt+1(i)∑
iDt+1(i)
- endfor
- return weights αt, and classifiers ft for t = 1, . . . , T .
Final Combined Prediction:
- For any test input x, define the aggregation function as: g(x) :=
∑
t αtft(x), and return the pre-
diction as sign(g(x)).
We’ll prove the following statement: If for each iteration t there is some γt > 0 such that
ϵt =
1
2 − γt (that is, assuming that at each iteration the error of the classifier ft is just γt better than
random guessing), then error of the aggregate classifier
err(g) :=
1
m
∑
i
1[yi ̸= sign(g(xi))] ≤ exp(−2
T∑
t=1
γ2t ).
That is, the error of the aggregate classifier g decreases exponentially fast with the number of com-
binations T !
(i) Let Zt :=
∑
iDt+1(i) (i.e., Zt denotes the normalization constant for the weighted distribu-
tion Dt+1). Show that
DT+1(i) =
1
m
1∏
t Zt
exp(−yig(xi)).
(ii) Show that error of the aggregate classifier g is upper bounded by the product of Zt: err(g) ≤∏
t Zt.
(hint: use the fact that 0-1 loss is upper bounded by exponential loss)
3
(iii) Show that Zt = 2
√
ϵt(1− ϵt).
(hint: noting Zt =
∑