Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP532 Computer Science SECOND SEMESTER EXAMINATIONS
Machine Learning and BioInspired Optimization INSTRUCTIONS TO CANDIDATES 1. Answer FOUR questions. If you attempt to answer more questions than the required number of questions, the marks awarded for the excess question answered will be discarded (starting with your lowest mark); 2. You should enter your answers in a word processing document, e.g., MS Word or LaTex (Overleaf); 3. Some questions require a diagram - you can either just sketch this on plain paper, and then scan/photograph, or using any drawing software (e.g., MS word, MS PowerPoint and Google Drawings), and insert into the word processing document; 4. Having completed your answers, convert to PDF and submit via VITAL before 10am, Friday 22nd May. It is strongly suggested to submit your answers at least half an hour before the deadline, i.e., 9:30am Friday 22nd May, in case you miss the deadline of submission; 5. You MUST complete the exam by the above stated ending time; 6. You can only submit ONCE; 7. Expected Writing Time: 3 hours. 8. Please do NOT include your name in your answer sheet. 9. This is an open-book assessment. Copying any material from another source, or colluding with any other person in the preparation and production of this work will be considered suspected academic misconduct and will be dealt with according to the University’s Academic Integrity Policy. 10. The use of a calculator IS allowed. PAPER CODE COMP532 page 1 of 5 Continued 1. Reinforcement Learning (25 marks) (a) (8 marks) The following table is a list of action values at time t . Calculate the action probabilities for a1, a2 and a3 according to Softmax Action Selection using the Boltzmann distribution with τ = 2. an a1 a2 a3 Qt (an) -1 4 2 (b) (10 marks) Consider the 2 × 2 deterministic grid world in Figure 1(a). Allowable moves are shown by arrows, and the numbers indicate the reward for performing each action. At time step t , a Q-learning agent has computed Q values as given in Figure 1(b); e.g., Q(LowerLeft ,Up) = 15 and Q(LowerLeft ,Right) = 20. Show the changes in the Q value estimates when the agent subsequently takes the path shown by the dotted line in Figure 1(c) (the agent starts in the lower left cell) when learning rate α = 0.5 and discount factor γ = 0.75. Clearly show how you computed your answer. Figure 1: The deterministic grid world. (c) (7 marks) Draw the backup diagram of the Temporal Difference learning method. In ad- dition, explain bootstrapping and sampling using the diagram. 2. Multi-Agent Reinforcement Learning (25 marks) (a) (15 marks) Below is the payoff matrix for the Prisoners Dilemma game. (Row player: x; Column player: y; C: cooperate; D: Defect) C D C (3, 3) (0, 5) D (5, 0) (1, 1) (1) State the Nash Equilibrium of the game and explain your answer. (2) Consider that at some time t that x = (0.3, 0.7) and y = (0.6, 0.4), where the first argument in the vectors is the probability of cooperating and the second argument is the probability of defecting for players x and y respectively. According to the replicator dynamics, calculate x˙1 and y˙1, i.e., the change in probability of playing the first action (C), for both players. Show how you computed your answer. PAPER CODE COMP532 page 2 of 5 Continued (b) (5 marks) Fig. 2 is the direction field plot of the Rock-Paper-Scissors game. According to the plot, is there a Nash equilibrium for the game? If yes, is the Nash equilibrium evolutionarily stable? Explain your answer. Figure 2: The direction field for the Rock-Paper-Scissors game. (c) (5 marks) What are Markov Games, and how do they extend repeated games and MDPs? Provide an (abstract) example. 3. Swarm Intelligence (25 marks) (a) (15 marks) Consider a Travelling Salesman problem as shown in Fig. 3: there are 4 different cities, i.e., Liverpool (A), Manchester (B), Leeds (C) and Sheffield (D), and the edges represent the connection between the cities; the numbers on the edges represent the distances between the cities; a salesman travels over the graph who aims to find a closed tour of minimal length connecting each city - using Ant System. Take ηij = 1/dij , where dij is the distance between cities i and j . Figure 3: The map of connections between Liverpool, Manchester, Leeds and Sheffield. PAPER CODE COMP532 page 3 of 5 Continued (1) For one ant starting at Liverpool, calculate the values of p1AB(1), p 1 AC(1), p 1 AD(1). Take parameter values α = 1, β = 3, ρ = 0.5,Q = 100, τ = 10−6. (2) Consider that the ant took the tour A→ B→ C→ D→ A. Calculate: τAB(2) and τAC(2). (b) (5 marks) Draw a graph to show the velocity update of the Particle Swarm Optimisation algorithm and give a brief explanation of your graph. (c) (5 marks) What type of multi-agent learning is Swarm Intelligence, and what are its main benefits? 4. Deep Learning (25 marks) (a) (6 marks) Figure 4 is a network for the function y = wx + b. At the current time, x = −3,w = 2,b = 3. Calculate the updated w by applying back-propagation. Figure 4: The network. (b) (6 marks) (1) In one convolution layer, we have 8 filters of 3x3x3 applied to input volume 64x64x3 with stride 1 and pad 1. What is the size of the output volume? Give details of how you calculate the size of the output volume. (6 marks) (2) Using this example, explain how to apply the convolution operation to the input volume to get the output volume. (c) (7 marks) Draw a graph to explain the structure of a Deep-Q network that would be suitable for an Atari Games agent. 5. Artificial Immune Systems and DNA Computing (25 marks) (a) (5 marks) Explain Clonal Selection in the Artificial Immune System. (b) (5 marks)
What are the common characteristics of Artificial Immune Systems and Rein- forcement Learning?
Give your explanations. (c) (5 marks) What are the five characteristics of AIS in common with Swarm Systems?
Give your explanations. PAPER CODE COMP532 page 4 of 5 Continued (d) (5 marks)
What are the common characteristics in the Parallel Problem Solving methods we have covered,
including single-/multi-agent learning, deep learning, swarm intelli- gence,
artificial immune system and DNA computing? Give your explanations. (e) (5 marks)
What are the five steps of Adleman’s experiment in DNA computing to solve the Traveling Salesman Problem (TSP)