Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
EECE5644 Assignment
Please submit your solutions at the assignments page in Canvas in the form of a single PDF file
that includes all math, numerical and visual results. Also, for verification of the existence of your
own computer implementation, include a link to your online code repository or include the code
as an appendix / attachment in a ZIP file along with the PDF. The code is not graded, but helps
verify your results are feasible as claimed. Only results and discussion presented in the PDF will
be graded, so do not link to an external location where further results may be presented.
This is a graded assignment and the entirety of your submission must contain only your own
work. You may benefit from publicly available literature including software (not from classmates),
as long as these sources are properly acknowledged in your submission. All discussions and ma-
terials shared during office periods are also acceptable resources and these tend to be very useful,
so participate in office periods or take a look at their recordings. Cite your sources as appropri-
ate. Discussing verbally with classmates are acceptable, but there can not be any written material
exchange.
By submitting a PDF file in response to this take home assignment you are declaring that the
contents of your submission, and the associated code is your own work, except as noted in your
citations to resources and allowed otherwise as described.
1
Question 1 (40%)
The probability density function (pdf) for a 2-dimensional real-valued random vector X is as
follows: p(x) = P(L = 0)p(x|L = 0)+P(L = 1)p(x|L = 1). Here L is the true class label that
indicates which class-label-conditioned pdf generates the data.
The class priors are P(L= 0) = 0.65 and P(L= 1) = 0.35. The class class-conditional pdfs are
p(x|L= 0) =w1g(x|m01,C01)+w2g(x|m02,C02) and p(x|L= 1) = g(x|m1,C1), where g(x|m,C)
is a multivariate Gaussian probability density function with mean vector m and covariance matrix
C. The parameters of the class-conditional Gaussian pdfs are: w1 = w2 = 1/2, and
m01 = [30 ] C01 = [
2 0
0 1 ] m02 = [
0
3 ] C02 = [
1 0
0 2 ] m1 = [22 ] C1 = [
1 0
0 1 ]
For numerical results requested below, generate the following independent datasets each con-
sisting of iid samples from the specified data distribution, and in each dataset make sure to include
the true class label for each sample.
• D20train consists of 20 samples and their labels for training;
• D200train consists of 200 samples and their labels for training;
• D2000train consists of 2000 samples and their labels for training;
• D10Kvalidate consists of 10000 samples and their labels for validation;
Part 1: (10%) Determine the theoretically optimal classifier that achieves minimum prob-
ability of error using the knowledge of the true pdf. Specify the classifier mathematically and
implement it; then apply it to all samples in D10Kvalidate. From the decision results and true labels for
this validation set, estimate and plot the ROC curve of this min-P(error) classifier, and on the ROC
curve indicate, with a special marker, the location of the min-P(error) classifier. Also report an
estimate of the min-P(error) achievable, based on counts of decision-truth label pairs on D10Kvalidate.
Optional: As supplementary visualization, generate a plot of the decision boundary of this classi-
fication rule overlaid on the validation dataset. This establishes an aspirational performance level
on this data for the following approximations.
Part 2: (30%) (a) Using the maximum likelihood parameter estimation technique train three
separate logistic-linear-function-based approximations of class label posterior functions given a
sample. For each approximation use one of the three training datasets D20train, D
200
train, D
2000
train. When
optimizing the parameters, specify the optimization problem as minimization of the negative-log-
likelihood of the training dataset, and use your favorite numerical optimization approach, such as
gradient descent or Matlab’s fminsearch. Determine how to use these class-label-posterior approx-
imations to classify a sample in order to approximate the minimum-P(error) classification rule;
apply these three approximations of the class label posterior function on samples in D10Kvalidate, and
estimate the probability of error that these three classification rules will attain (using counts of
decisions on the validation set). Optional: As supplementary visualization, generate plots of the
decision boundaries of these trained classifiers superimposed on their respective training datasets
and the validation dataset. (b) Repeat the process described in Part (2a) using a logistic-quadratic-
function-based approximation of class label posterior functions given a sample. How does the
performance of your classifiers trained in this part compare to each other considering differences
in number of training samples and function form? How do they compare to the theoretically opti-
mal classifier from Part 1? Briefly discuss results and insights.
1
Note 1: With x representing the input sample vector and w denoting the model parameter vec-
tor, logistic-linear-function refers to h(x,w)= 1/(1+e−wTz(x)), where z(x)= [1,xT]T; and logistic-
quadratic-function refers to h(x,w) = 1/(1+ e−wTz(x)), where z(x) = [1,x1,x2,x21,x1x2,x
2
2]
T.
Question 2 (40%)
Assume that scalar-real y and two-dimensional real vector x are related to each other according
to y = c(x,w)+ v, where c(.,w) is a cubic polynomial in x with coefficients w and v is a random
Gaussian random scalar with mean zero and σ2-variance.
Given a dataset D = (x1,y1), . . . ,(xN ,yN) with N samples of (x,y) pairs, with the assumption
that these samples are independent and identically distributed according to the model, derive two
estimators for w using maximum-likelihood (ML) and maximum-a-posteriori (MAP) parameter
estimation approaches as a function of these data samples. For the MAP estimator, assume that w
has a zero-mean Gaussian prior with covariance matrix γI.
Having derived the estimator expressions, implement them in code and apply to the dataset
generated by the attached Matlab script. Using the training dataset, obtain the ML estimator and
the MAP estimator for a variety of γ values ranging from 10−4 to 104. Evaluate each trained
model by calculating the average-squared error between the y values in the validation samples and
model estimates of these using c(.,wtrained). How does your MAP-trained model perform on the
validation set as γ is varied? How is the MAP estimate related to the ML estimate? Describe your
experiments, visualize and quantify your analyses (e.g. average squared error on validation dataset
as a function of hyperparameter γ) with data from these experiments.
Note: Point split will be 20% for ML and 20% for MAP estimator results.
Question 3 (20%)
Let Z be drawn from a categorical distribution (takes discrete values) with K possible out-
comes/states and parameter θ , represented by Cat(Θ). Describe the value/state using a 1-of-K
scheme for z = [z1, . . . ,zK]T where zk = 1 if variable is in state k and zk = 0 otherwise. Let the
parameter vector for the pdf be Θ= [θ1, . . . ,θK]T , where P(zk = 1) = θk, for k ∈ {1, . . . ,K}.
Given D{z1, . . . ,zN} with iid samples zn ∼Cat(Θ) for n ∈ {1, . . . ,N}:
• What is the ML estimator for Θ?
• Assuming that the prior p(Θ) for the parameters is a Dirichlet distribution with hyperparam-
eter α , what is the MAP estimator for Θ?
Hint: The Dirichlet distribution with parameter α is
p(Θ|α) = 1
B(α)
K
∏
k=1
θαk−1k where the normalization constant is B(α) =
∏Kk=1Γ(αk)
Γ(∑Kk=1αk)