Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS5010
AI Principles This assessment consists of exam-style questions and you should answer as you would in an exam. As such, citations of sources are not expected, but your answers should be from your own memory and understanding and significant stretches of text should not be taken verbatim from sources. Any illustrations or diagrams you include should be original (hand or computer drawn). You may word-process your answers, or hand-write and scan them. In either case, please return your answers as a single PDF. If you handwrite, please make sure the pages are legible, the right way up and in the right order. Your submission should be your own unaided work. While you are encouraged to work with your peers to understand the content of the course while revising, once you have seen the questions you should avoid any further discussion until you have submitted your results. You must submit your completed assessment on MMS within 72 hours of it being sent to you. Assuming you have revised the module contents beforehand, answering the questions should take no more than three hours. Page 1 of 8 AN SW ER S 1. Let E be the restaurant domain data given in the table below, in which input attributes influence decisions on whether or not to wait for a table. EXAMPLE PRICE ESTIMATED WAIT HUNGRY? BAR? Wait D1 Expensive 30-60 No Yes No D2 Expensive 30-60 No No No D3 Medium 30-60 No Yes Yes D4 Cheap 10-30 No Yes Yes D5 Cheap 0-10 Yes Yes Yes D6 Cheap 0-10 Yes No No D7 Medium 0-10 Yes No Yes D8 Expensive 10-30 No Yes No D9 Expensive 0-10 Yes Yes Yes D10 Cheap 10-30 Yes Yes Yes D11 Expensive 10-30 Yes No Yes D12 Medium 10-30 No No Yes D13 Medium 30-60 Yes Yes Yes D14 Cheap 10-30 No No No (a) In a decision tree representation: (i) Each internal node tests an attribute (ii) Each branch corresponds to attribute value (iii) Each leaf node assigns a classification, i.e. a decision about waiting A new example is passed down the tree to the leaf node with the specific decision. [4 marks] (b) The curve plots percent correct decisions for test data as number of training instances increases. A curve rising to near 100% shows realisable performance (i.e. close to the ground truth); a curve with slow monotone growth to well below 100% shows redundant performance (i.e. missing attributes); a curve with a fast rise then a plateau at well below 100% shows nonrealisable performance (either missing attributes or a restricted hypothesis class). [4 marks] (c) This training set was not used in the course notes, but similar calculations were done for similar data. We have 9 positive and 5 negative examples in the prior, Page 2 of 8 AN SW ER S so H(E) = − 9 14 log2 ( 9 14 ) − 5 14 log2 ( 5 14 ) = − 9 14 log10 ( 9 14 ) log10(2) − 5 14 log10 ( 5 14 ) log10(2) ≈ − 9 14 −0.192 0.301 − 5 14 −0.447 0.301 = 0.410+ 0.530 = 0.940 Hence ≈ 0.940 bits. [2 marks] (d) For PRICE we have 2 positives and 3 negatives on the expensive branch; 4 posi- tives and 0 negatives on the medium; 3 positives and 2 negatives on the cheap. These have entropies 0.971, 0 and 0.971 respectively. So the information gain is 0.940− ( 5 14 ) 0.971− ( 4 14 ) 0− ( 5 14 ) 0.971 = 0.246 . For ESTIMATED WAIT we have 2 positives and 2 negatives on the 30-60 branch; 4 positives and 2 negative on the 10-30 branch; 3 positives and 1 negative on the 0-10 branch. These have entropies 1, 0.918 and 0.811 respectively. So the information gain is 0.940− ( 4 14 ) − ( 6 14 ) 0.918− ( 4 14 ) 0.971 = 0.029 . For HUNGRY? we have 3 positives and 4 negatives on the left branch; 6 positives and 1 negative on the right. These have entropies 0.985 and 0.592 respectively. So the information gain is 0.940− ( 7 14 ) 0.985− ( 7 14 ) 0.592 = 0.151 . For BAR? we have 6 positives and 2 negatives on the left branch; 3 positives and 3 negative on the right. These have entropies 0.811 and 1 respectively. So the information gain is 0.940− ( 8 14 ) 0.811− ( 6 14 ) 1.0 = 0.048 . [4 marks] (e) PRICE has the greater information gain, hence gives the greater reduction in entropy (i.e. leads to sub nodes with more purity), and hence should be preferred as a node to split on. [2 marks] Page 3 of 8 AN SW ER S (f) We would have a tree with 14 branches each leading to a decision. This would describe the current data perfectly, but would not generalise to unseen examples (i.e. the model would overfit the data). [2 marks] (g) The DT learning agent is a simple hypothesis that is approximately consistent with training examples: there is no guarantee of perfect accuracy. In other terms from the lectures, h ≈ f in a perfectly rational agent, where h is the hypothesis used and f is the ground truth; equality is never realised in practice. [2 marks]