Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS5010
AI Principles
This assessment consists of exam-style questions and you should answer as you would
in an exam. As such, citations of sources are not expected, but your answers should be
from your own memory and understanding and significant stretches of text should not be
taken verbatim from sources. Any illustrations or diagrams you include should be original
(hand or computer drawn). You may word-process your answers, or hand-write and scan
them. In either case, please return your answers as a single PDF. If you handwrite, please
make sure the pages are legible, the right way up and in the right order. Your submission
should be your own unaided work. While you are encouraged to work with your peers to
understand the content of the course while revising, once you have seen the questions you
should avoid any further discussion until you have submitted your results. You must submit
your completed assessment on MMS within 72 hours of it being sent to you. Assuming
you have revised the module contents beforehand, answering the questions should take no
more than three hours.
Page 1 of 8
AN
SW
ER
S
1. Let E be the restaurant domain data given in the table below, in which input attributes
influence decisions on whether or not to wait for a table.
EXAMPLE PRICE ESTIMATED WAIT HUNGRY? BAR? Wait
D1 Expensive 30-60 No Yes No
D2 Expensive 30-60 No No No
D3 Medium 30-60 No Yes Yes
D4 Cheap 10-30 No Yes Yes
D5 Cheap 0-10 Yes Yes Yes
D6 Cheap 0-10 Yes No No
D7 Medium 0-10 Yes No Yes
D8 Expensive 10-30 No Yes No
D9 Expensive 0-10 Yes Yes Yes
D10 Cheap 10-30 Yes Yes Yes
D11 Expensive 10-30 Yes No Yes
D12 Medium 10-30 No No Yes
D13 Medium 30-60 Yes Yes Yes
D14 Cheap 10-30 No No No
(a) In a decision tree representation:
(i) Each internal node tests an attribute
(ii) Each branch corresponds to attribute value
(iii) Each leaf node assigns a classification, i.e. a decision about waiting
A new example is passed down the tree to the leaf node with the specific
decision. [4 marks]
(b) The curve plots percent correct decisions for test data as number of training
instances increases. A curve rising to near 100% shows realisable performance
(i.e. close to the ground truth); a curve with slow monotone growth to well
below 100% shows redundant performance (i.e. missing attributes); a curve with
a fast rise then a plateau at well below 100% shows nonrealisable performance
(either missing attributes or a restricted hypothesis class). [4 marks]
(c) This training set was not used in the course notes, but similar calculations were
done for similar data. We have 9 positive and 5 negative examples in the prior,
Page 2 of 8
AN
SW
ER
S
so
H(E) = − 9
14
log2
(
9
14
)
− 5
14
log2
(
5
14
)
= − 9
14
log10
( 9
14
)
log10(2)
− 5
14
log10
( 5
14
)
log10(2)
≈ − 9
14
−0.192
0.301
− 5
14
−0.447
0.301
= 0.410+ 0.530
= 0.940
Hence ≈ 0.940 bits. [2 marks]
(d) For PRICE we have 2 positives and 3 negatives on the expensive branch; 4 posi-
tives and 0 negatives on the medium; 3 positives and 2 negatives on the cheap.
These have entropies 0.971, 0 and 0.971 respectively. So the information gain is
0.940−
(
5
14
)
0.971−
(
4
14
)
0−
(
5
14
)
0.971 = 0.246 .
For ESTIMATED WAIT we have 2 positives and 2 negatives on the 30-60 branch;
4 positives and 2 negative on the 10-30 branch; 3 positives and 1 negative on
the 0-10 branch. These have entropies 1, 0.918 and 0.811 respectively. So the
information gain is
0.940−
(
4
14
)
−
(
6
14
)
0.918−
(
4
14
)
0.971 = 0.029 .
For HUNGRY? we have 3 positives and 4 negatives on the left branch; 6 positives
and 1 negative on the right. These have entropies 0.985 and 0.592 respectively. So
the information gain is
0.940−
(
7
14
)
0.985−
(
7
14
)
0.592 = 0.151 .
For BAR? we have 6 positives and 2 negatives on the left branch; 3 positives and
3 negative on the right. These have entropies 0.811 and 1 respectively. So the
information gain is
0.940−
(
8
14
)
0.811−
(
6
14
)
1.0 = 0.048 .
[4 marks]
(e) PRICE has the greater information gain, hence gives the greater reduction in
entropy (i.e. leads to sub nodes with more purity), and hence should be preferred
as a node to split on. [2 marks]
Page 3 of 8
AN
SW
ER
S
(f) We would have a tree with 14 branches each leading to a decision. This would
describe the current data perfectly, but would not generalise to unseen examples
(i.e. the model would overfit the data). [2 marks]
(g) The DT learning agent is a simple hypothesis that is approximately consistent
with training examples: there is no guarantee of perfect accuracy. In other terms
from the lectures, h ≈ f in a perfectly rational agent, where h is the hypothesis
used and f is the ground truth; equality is never realised in practice. [2 marks]