INFR11205 INTRODUCTORY APPLIED MACHINE LEARNING
INTRODUCTORY APPLIED MACHINE LEARNING
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
INFR11205 INTRODUCTORY APPLIED MACHINE LEARNING
(PG-2) AND INFD11005 INTRODUCTORY APPLIED MACHINE
LEARNING (DISTANCE LEARNING)
Tuesday 4 th May 2021
13:00 to 15:00
INSTRUCTIONS TO CANDIDATES
1. Note that ALL QUESTIONS ARE COMPULSORY.
2. DIFFERENT QUESTIONS MAY HAVE DIFFERENT NUMBERS
OF TOTAL MARKS. Take note of this in allocating time to questions.
3. This is an OPEN BOOK examination.
MSc Courses
Convener: A.Pieris
External Examiners: W.Knottenbelt, M.Dunlop, E.Vasilaki.
THIS EXAMINATION WILL BE MARKED ANONYMOUSLY
1. Consider the data sets, A and B, of two dimensional features shown below, where
each sample point is represented as a small symbol ’×’ or ’◦’.
-5 0 5 10 15
-5
0
5
10
x1
x 2
-10 -5 0 5 10
-6
-4
-2
0
2
4
6
x1
x 2
Data set A Data set B
(a) For data set A, make a quick copy of the axes and rough distributions of [2 marks ]
the data, and sketch the principal components that you would obtain with
principal component analysis (PCA). Label the first principal component as
a and the second as b. Briefly justify the principal components you drew.
(b) In data set B, there are two classes - samples of Class 1 are shown with [2 marks ]
a symbol ’◦’ and those of Class 2 with a symbol ’×’. Explain what pre-
processing you would apply to the data set so that a logistic regression
model classifies the samples into the two classes reasonably well.
Page 1 of 6
2. Consider a square region on a X1X2-plane, whose bottom-left corner is (0,−1)
and the top-right corner (7, 6). We have the following training set of six samples:
x1=
(
1
3
)T
; x2=
(
2
4
)T
; x3=
(
3
3
)T
; x4=
(
2
1
)T
; x5=
(
4
2
)T
; x6=
(
6
2
)T
,
where T denotes vector transpose. We assume that x1,x2, and,x3 belong to Class
1 (denoted as C1) and x4,x5, and,x6 to C2.
Note that:
• If you plot/sketch graphs, a separate graph should be used for each question
part, where the horizontal axis should correspond to X1 and the vertical
axis to X2, and not vice versa.
• Unless stated, explicit calculation is not required - you should instead answer
questions using visual inspection.
• You should present not only the final result, but also the process by which
you obtained it.
Answer the following questions.
(a) Make a plot of the data and sketch decision boundaries and decision regions [2 marks ]
when you employ k-nearest neighbour classification with k = 1 and the
Euclidean distance measure.
(b) Make a new plot of the data and sketch decision boundaries that would be [2 marks ]
produced by a linear Support Vector Machine (SVM) without slack vari-
ables. You should also highlight possible support vectors, show the margin,
and explain your answer.
(c) We now consider fitting a Gaussian distribution to the data of Class 2 us-
ing maximum likelihood parameter estimation while also making the naive
Bayes assumption.
i. Write out an expression for p(x|C2), identifying the values of parameters [3 marks ]
of the distribution.
ii. Sketch the distribution, i.e., p(x|C2), that you have obtained in part (i) [2 marks ]
above. On the same graph, sketch the distribution you would have if
you do not use the naive Bayes assumption.
(d) We now have an additional sample x7 =
(
1
5
)T
in C1 added to the training
set. For each condition shown below, discuss what changes you would expect
in decision boundaries compared with the ones you obtain for the original
training set of six samples.
i. Linear SVM with slack variables, whose regularisation term is given as [2 marks ]
C (
∑n
i=1 ξi), where C = 1000.
ii. Logistic regression classifier. [2 marks ]
Page 2 of 6
3. A course lecturer, who teaches a whole-year course, CS101, at a university in
the UK, needs to find a way to determine pass/fail grade for each student in the
class without having marks for the final examination that was cancelled due to
a worldwide pandemic. Apart from the final examination, which was cancelled,
the course has five assignments, for which marking has been done already. As
a first attempt, the lecturer employed a logistic regression model that takes an
input vector x = (x1, x2, . . . , x5), where xi, 1 ≤ i ≤ 5, represents the marks of
i-th assignment. Using the course data of around 500 students for the last two
years, the lecturer trained the model and obtained classification accuracy (ACC)
of 0.95.
(a) Write out the form of a logistic regression model for this task, clarifying what [2 marks ]
variables you use, and explain how you decide pass/fail with the model given
x.
(b) Within the last two years data, the lecturer wanted to avoid the case that [3 marks ]
students who passed the course were predicted as ’fail’ by the classifier.
Explain how this could be achieved without retraining the model and discuss
its possible side effects.
(c) After some parameter tuning, the lecturer confirmed the classifier predicted [3 marks ]
pass/fail grades satisfactory. Discuss possible problems with using the clas-
sifier to predict pass/fail grades for this year.
Page 3 of 6
4. Consider the following dataset, consisting of fifteen instances A...O each repre-
sented with two numeric attributes (X1, X2), shown on the left, and plotted on
the right:
A:(1,4) B:(2,4) C:(3,4)
D:(1,6) E:(2,6) F:(3,6)
G:(1,7) H:(2,7) I:(3,7)
J:(1,8) K:(2,8) L:(3,8)
M:(1,9) N:(2,9) O:(3,9)
(a) Explain how the K-means clustering method works. Run the K-means al- [3 marks ]
gorithm to completion on the dataset above. Assume K=2 and use the
Euclidean distance metric. Set the initial means to be µ1 = (4, 4) and
µ2 = (4, 9). Show your work. Report the means and clusters at each itera-
tion including the final positions of µ1 and µ2 and list the instances in each
cluster when the algorithm terminates. Note: this question can be answered
without computing exact distance values.
(b) Run the bottom-up agglomerative clustering algorithm to completion on the [3 marks ]
above dataset. Use the Euclidean distance metric and the complete link
merging rule. Resolve ties by considering instances/clusters in alphabetical
order, where clusters are labelled by the alphabetically-first instance in the
cluster. For example, a cluster consisting of A,K would be labelled A; and
if clustering candidate Z,X has the same value as candidate Z, Y , then Z,X
will be chosen. Show your workings. Note: this question can be answered
without computing exact distance values.
(c) Explain what a dendrogram is, and draw one for your clustering. What are [3 marks ]
the clusters if we threshold at a distance of 2.01? Note: this question can
be answered without computing exact distance values.
Page 4 of 6
5. A psychologist friend has a new client and would like you to make a prediction
if the person is likely to regard their treatment as helpful. The psychologist
has recently done a small survey of whether recent clients found their treatment
helpful, which they think could help with prediction.
Each respondent provides a vector with entries 1 or 0 corresponding to whether
they answer ‘yes’ to a question or ‘no’, respectively. The question vector has
attributes
x = (rich,married, healthy)
In addition, each respondent gives a value y = 1 if they found treatment helpful,
and y = −1 if they did not.
Thus, a response (1, 1, 0) would indicate that the respondent was ‘rich’, ‘married’,
‘unhealthy’.
The following responses were obtained from people who indicated they found
treatment helpful:
(1, 1, 0), (0, 0, 1), (1, 1, 1), (1, 1, 0).
For the ‘not helpful’ respondents, the data is
(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 1, 1).
(a) Your friend has been taking a statistics course, and suggests that you might [2 marks ]
try a Linear Regression model as that is good for prediction. Would you
agree? Explain. They also suggest Decision Tree. Name any other methods
have you studied in IAML that would be suitable to try.
(b) You decide to start by using a Decision Tree to make the prediction. Show [4 marks ]
how you would use the ID3 algorithm on this data to construct, by hand, the
corresponding full Decision Tree. Use Information Gain to select attributes
to split on. Show your workings.