Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP5318 Machine Learning and Data Mining
Information about the exam • The exam will be online, via Canvas, un-proctored. It is set as an Assignment.
• The Canvas site for the exam is different that the Canvas site we use during the semester – it is called
“Final Exam for COMP5318. The Exams Office will give you access to your exam site.
• The duration of the exam is standard: 2 hours + 10 minutes reading time.
In addition, there are 15 minutes for upload, hence the duration of the exam is 145 minutes.
The exam site will open at the scheduled time (as per your exam timetable) and close after 145 mins.
• The exam is worth 100 marks ( =60% of your final mark). To pass the course you need at least 40% on the exam (i.e. 40 marks),
regardless of what your mark during the semester is. • All material is examinable, except the guest lecture in week 13. Python is not examinable.
• The next page is the actual cover page of the exam – please read it carefully • The exam is a restricted open book. o You are allowed to use: 1)
the teaching materials from this course - lecture slides and tutorial notes, 2) 1 page of
your own notes - double-sided A4 size, handwritten or typed which must be uploaded before the exam
to the Canvas COMP5318 website, 3) calculator - non-programmable. o No other materials or devices are allowed.
No internet browsing is allowed. You can’t consult other people during the exam. o All exam papers will go through
TurnItIn for plagiarism checking. • The exam paper is confidential. You must not circulate it in any way during or after the exam.
You must not show it to other people, discuss it with other people, post it or distribute it in any way, during or after the exam.
• What to submit - instructions: o Type your answers in your text editor (Word, Latex, etc), convert the file into a pdf file and submit it to Canvas. No other file format will be accepted. o Hand-written responses will not be accepted, you need to type your answers. o Start by typing your student ID number on the first page. Do not type your name as the marking is anonymous. o Submit only your answers to the questions, do not copy the questions. • There are 2 types of questions: 1) questions requiring short answers, and 2) problem- solving / calculation questions. 2 CONFIDENTIAL EXAM PAPER This paper must not be circulated in any way and must not be removed from the exam venue School of Computer Science EXAMINATION Semester 1 - Main, 2021 COMP5318 Machine Learning and Data Mining EXAM WRITING TIME: 2 hours READING TIME: 10 minutes EXAM CONDITIONS: This is a RESTRICTED OPEN book examination - specified materials permitted. All submitted work must be completed individually without consulting anybody else, without browsing the internet or using other materials and devices apart from the permitted, in accordance with the University Policy on “Academic Honesty in Coursework”. All submissions will go through TurnItIn for plagiarism detection and the penalties are severe. MATERIALS PERMITTED IN THE EXAM: 1. Teaching materials from this course – lecture slides and tutorial notes 2. One page of student’s own notes - double-sided A4 size, handwritten or typed. This page must be uploaded before the exam to the Canvas COMP5318 website. 3. Calculator – non-programmable MATERIALS TO BE SUPPLIED TO STUDENTS: None INSTRUCTIONS TO STUDENTS: 1. Type your answers in your text editor (Word, Latex, etc), convert the file into a pdf file and submit it to Canvas. No other file format will be accepted. 2. Hand-written responses will not be accepted, you need to type your answers. 3. Start by typing your student ID number on the first page. Do not type your name as the marking is anonymous. 4. Submit only your answers to the questions, do not copy the questions. For examiner use only: Q1 (13) Q2 (10) Q3 (10) Q4 (13) Q5 (14) Q6 (11) Q7 (12) Q8 (17) Total (100) 3 Sample exam questions Question 1. Short answers Select the correct answer and provide a brief explanation. 1. Leave-one-out cross validation is suitable for large data sets. a) True b) False Explanation: 2. The regression line minimizes the sum of the residuals a) True b) False Explanation: Note: These questions are not multiple-choice questions. You need to provide an explanation, otherwise you will receive 0 marks. Question 2. Short answers 1. Why do we need to apply normalization when using distance-based algorithms such as k- Nearest Neighbor? 2. In linear support vector machines, we use dot products both during training and during classification of a new example. What vectors are these products of? During training: During classification of new example: 4 Calculation (problem-solving) questions Question 3. Decision tree Given is the following training data where location, weather and expensive are the features and holiday is the class. a) What is the entropy of this set of training examples with respect to the class? b) We would like to build a decision tree using information gain. Which attribute will be selected as a root of the tree? Show your calculations. You may use this table: x y -(x/y)* log2(x/y) x y -(x/y)* log2(x/y 1 2 0.50 1 6 0.43 1 3 0.53 5 6 0.22 2 3 0.39 1 7 0.40 1 4 0.5 2 7 0.52 3 4 0.31 3 7 0.52 1 5 0.46 4 7 0.46 2 5 0.53 5 7 0.35 3 5 0.44 6 7 0.19 4 5 0.26 Question 4. Naïve Bayes Given is the following training data where location, weather, companion and expensive are the features and holiday is the class. Use Naïve Bayes to predict the value of holiday for the following new example, showing your calculations: location=boring, weather=sunny, companion=annoying, expensive=Y. location weather expensive holiday nice sunny Y good nice sunny N bad boring rainy Y good boring sunny N bad nice rainy Y good boring rainy N good boring rainy N good location weather companion expensive holiday nice sunny annoying Y good nice sunny annoying N bad boring rainy great Y good boring sunny great Y bad nice rainy great Y good boring rainy annoying N good boring rainy great N good 5 Question 5. 1R Given the training data in the table below where credit history, debt, deposit and income are attributes and risk is the class, predict the class of the following new example using the 1R algorithm: credit history=unknown, debt=low, deposit=none, income=average. If needed, settle ties by random selection. Show your calculations. credit history debt deposit income risk bad high none low high unknown high none average high unknown low none average moderate unknown low none low high unknown low none high low unknown low adequate high low bad low none low high bad low adequate high moderate good low none high low good high adequate high low good high none low high good high none average moderate good high none high low bad high none average high Question 6. K-means clustering Suppose that we are given 7 examples to cluster: A, B, C, D, E, F and G. The distance between them is given by the following matrix: A B C D E F G A 0 10 2 1 12 5 4 B 10 0 4 3 6 23 7 C 2 4 0 5 9 14 19 D 1 3 5 0 1 7 4 E 12 6 9 1 0 2 18 F 5 23 14 7 2 0 6 G 4 7 19 4 18 6 0 Run the k-means algorithm to group these examples into 2 clusters for 1 epoch. The initial centroids are A and B. Show the resulting clusters. 6 Question 7. Hidden Markov models Given is the following Markov model for the weather in Sydney: a) Given that today the weather is Sunny, what is the probability that it will be Sunny tomorrow and Rainy the day after tomorrow, i.e. what is the probability P(3 = , 2 = Sunny| 1= Sunny)? Hint: P(A,B|C) = P(A|B,C) P (B|C) b) If the weather yesterday was Rainy, and today is Foggy, what is the probability that tomorrow it will be Sunny? For both questions, briefly show your calculations. Question 8. Hidden Markov models Suppose you are locked into a room for several days, and you are asked about the weather outside. The only piece of evidence you have is whether the person who comes into the room to bring your daily meal is carrying an umbrella or not. The table below shows the probabilities P( |) of carrying an umbrella (= true) based on the weather of day . Thus, it shows that the probability that your caretaker carries an umbrella is 0.1 if the weather is sunny, 0.8 if it is rainy and 0.3 if it is foggy: Suppose that on the first day (the day when you were locked) the weather was sunny.
The next day, the caretaker carried an umbrella into the room. What was the weather most likely on this second day? Briefly show your calculations.
7 Question 9. Hidden Markov models The diagram below shows the Hidden Markov Model (HMM) for this scenario:
• Given a sequence of observations (type of clothing), find the hidden sequence of weather states (Sunny or Cloudy)
which caused Anna to choose the clothes she worn. Suppose that you know Anna wore T-shirt on the first day,
Hoodie on the second and Jacket on the third day. You know that the weather state of the first day (when Anna wore T-shirt)
was Sunny but you do not know the weather states of the next two days. What is the probability for a weather sequence
Sunny-Cloudy-Sunny for the three days? Briefly show your calculations. More example of short-answer questions:
Question 10. Clustering Briefly explain the main idea of density-based clustering. Give one example of a clustering
algorithm belonging to this approach. Question 11. Neural networks What is the disadvantage of using linear functions as
activation functions for multilayer neural networks? Question 12. Deep neural networks List one disadvantage of
applying fully connected multi-layer perceptron neural network to perform handwritten digits image classification. 8
Question 13. Recurrent neural networks Give application examples of a sequence-to-sequence recurrent neural networks.
Question 14. Reinforcement Learning Explain the objective of using -greedy strategy in Deep Q Learning.