Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Consider the following road map with distances indicated on lines drawn between towns (the map is not to scale). The straight-line distances form. each town to H are listed in the table.
What order are nodes expanded by iterative deepening depth first search when searching for a path between A and H? Where there is choice of nodes, take the first one by alphabetical ordering.
Assume the search algorithm includes cycle checking along a path, Tree-Search-IDDFS. Stop the search once the goal node is expanded.
a. A ABCD ABFCDEDCGH
b. A ABCD ABFCDH
c. A ABCD ABCDE ABFCDH
d. A ABCD ABFH
e. A ABCD ABFCDEGH
TOPIC: Search – Question: A* [4 marks]
Consider the following road map with distances indicated on lines drawn between towns (the map is not to scale). The straight-line distances form. each town to H are listed in the table.
What order are nodes expanded by A* search using the straight-line distances to H in the table as the heuristic function, when searching for a path between A and H? Where there is choice of nodes, take the first one by alphabetical ordering. Stop the search once the goal node is expanded.
a. ADH
b. ABFH
c. ABCDEFGH
d. ACDH
e. ACEGH
TOPIC: Reinforcement learning – Question: Softmax [4 marks]
Consider an RL agent navigating a gridworld, with four possible action: up (U), down (D), left (L), and right (R). The agent uses the softmax action selection method. Remember this method computes the probability of selecting an action using a Boltzmann distribution, as follows:
In a particular given state St, the agent has the following Q-values to decide what action to take next:
Q(U) |
Q(D) |
Q(L) |
Q(R) |
0.7698 |
0.6501 |
0.0252 |
-0.7698 |
What would be the action selected by the agent if the temperature T used is 0.9 and the random number drawn is 0.9021.
TOPIC: Reinforcement learning – Question: Returns [4 marks]
Consider the return equation shown below with a discount factor γ = 0.9 and a reward sequence of 7, 3, 1, 10, -10. The return G0 is equal to:
TOPIC: Neural networks – Question: Single-layer perceptron [4 marks]
Consider the training data shown in the following Table to divide the space. Using the single-layer perceptron learning rule with a learning rate α = 1.0 and initial weights w1 = 1, w2 = 0, and b = 1.5, what would be the final value of the weight after convergence?
Training example |
x1 |
x2 |
Class |
a |
0.0 |
0.0 |
1 |
b |
1.0 |
2.0 |
-1 |
c |
2.0 |
-1.0 |
-1 |
d |
-2.0 |
1.0 |
1 |