Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Naïve Bayes and Neural Networks
1 Written Exercises: Analyzing Naïve Bayes [5pts]
1.1 Bernoulli Naïve Bayes As A Linear Classifier
Consider a Naïve Bayes model for binary classification where the features X1; :::;Xd are also binary variables. As part
of training this Naïve Bayes model, we would estimate the conditional distributions P(Xijy=c) for c = f0; 1g and
i = 1; :::; d as well as the prior distribution P(y=c) for c = f0; 1g. For notational simplicity, let’s denote P(Xi=1jy=c)
as ic and P(Xi=0jy=c) as 1 ic. Likewise, let 1 be P(y=1) and 0 = 1 1 be P(y=0).
If we write out the posterior of this classifier (leaving out the normalizing constant), we would have:
P(y = 1jx1; :::; xd) = P(y = 1)
Qd
i=1 P(xijy = 1)
P(x1; :::; xd)
/ 1
d Y
i=1
xi
i1 (1 i1)
1 xi
(1)
P(y = 0jx1; :::; xd) = P(y = 0)
Qd
i=1 P(xijy = 0)
P(x1; :::; xd)
/ 0
d Y
i=1
xi
i0 (1 i0)
1 xi
(2)
The classification rule in Naïve Bayes is to choose the class with the highest posterior probability, that is to say by
predicting y = argmaxc P(y = cjx1; :::; xd) for a new example x = [x1; :::; xd]. In order for the prediction to be class
1, P(y = 1jx1; :::; xd) must be greater than P(y = 0jx1; :::; xd), or equivalently we could check if:
P(y = 1jx1; :::; xd)
P(y = 0jx1; :::; xd)
> 1 (3)
In this setting with binary inputs, we will show that this is a linear decision boundary. This also true for continuous
inputs if they are modelled with a member of the exponential family (including Gaussian) – see proof here in §3.1.
I Q1 Prove Bernoulli Naïve Bayes has a linear decision boundary [4pts]. Prove the Bernoulli
Naïve Bayes model described above has a linear decision boundary. Specifically, you’ll need to show that
class 1 will be predicted only if b + wT x > 1 for some parameters b and w. To do so, show that:
P(y = 1jx1; :::; xd)
P(y = 0jx1; :::; xd)
> 1 =) b +
d X
i=1
wixi > 0 (4)
As part of your report, explicitly write expressions for the bias b and each weight wi.
Hints: Expand Eq. (3) by substituting the posterior expressions from Eq. (1) & (2) into Eq. (3). Take the
log of that expression and combine like-terms.
.2 Duplicated Features in Naïve Bayes
aïve Bayes classifiers assume features are conditionally independent given the class label. When this assumption is
olated, Naïve Bayes classifiers may be over-confident or under-confident in the outputs. To examine this more closely,
e’ll consider the situation where an input feature is duplicated. These duplicated features are maximally dependent
making this is a strong violation of conditional independence. Here, we’ll examine a case where the confidence
creases when the duplicated features are included despite no new information actually being added as input.
I Q2 Duplicate Features in Naïve Bayes [1pts]. Consider a Naïve Bayes model with a single feature
X1 for a binary classification problem. Assume a uniform prior such that P(y = 0) = P(y = 1). Suppose the
model predicts class 1 for an example x then we know:
P(y = 1jX1 = x1) > P(y = 0jX1 = x1) (5)
Now suppose we make a mistake and duplicate the X1 feature in our data – now we have two identical in-
puts X1 and X2. Show that the predicted probability for class 1 is higher than it was before; that is, prove:
P(y = 1jX1 = x1;X2 = x2) > P(y = 1jX1 = x1) (6)
Hints: Use the assumption that P(y = 0) = P(y = 1) to simplify the posterior expressions in Eq. (6) and
Eq. (5). As X2 is an exact copy of X1, P(X2 = x2jy) is the same as P(X1 = x1jy) for any example.