Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MTHM606 Applied AI and Control
Coursework Part 3: Extended Investigations
The following is a list of suggested topics for your extended investigation. Note that other topics
are possible, including extensions to the questions in the two coursework sheets or the exercises
in the practical sheets. Please refer to the assessment information on ELE for the expectations of
the standard of the submitted assessment at each grade level (pass, merit, distinction).
Your extended investigation can be submitted in one format or two complementary formats:
scientific report, A1 poster, recorded presentation, magazine style article, video, podcast, blog or
other digital media. You will need to demonstrate skills in scientific communication in addition
to topic mastery. It is therefore important that you structure your work following good practice
in scientific communication and follow good practice in data visualisation if including figures and
tables, as was the case for your project report for MTHM601 Fundamentals of Data Science.
Please refer to the Introduction to Matlab workbook 2 and the Introduction to Control practical
for advice on producing professional figures using Matlab and Simulink.
1. Understand the TD(λ) reinforcement learning algorithm (see Sutton and Barto) and give a
brief explanation of how it works. Write your own code to apply TD(λ) to a cliff walking
example and compare the behaviour to the SARSA and Q-learning algorithms.
2. Use Q-learning to solve the task of playing tic-tac-toe against a random opponent (with
random choice of first player). You will find it useful to refer to section 6.8 on “afterstates”
in Sutton and Barto. Tic-tac-toe is in fact a completely solvable game in which the optimal
strategy/policy allows the first player to at least force a draw (look up the details of this!).
Comment on the policy found by your algorithm and whether reinforcement learning is a
good method for the task of creating a computer tic-tac-toe player.
3. Write a literature review on an application of reinforcement learning. For example, you
could draw on the three papers posted on ELE on Building Energy Management.
4. Describe the methodology and the key messages in the paper S. Duncan, C. Hepburn, A.
Papachristodoulou, “Optimal harvesting of fish stocks under a time-varying discount rate”,
Journal of Theoretical Biology, 269, pp. 166–173, 2011, doi:10.1016/j.jtbi.2010.10.002.
This paper concerns the importance of economic commitment mechanisms and appropriate
approaches to valuing future benefits for sustainable policymaking relating to the use of
depletable resources, such as fish stocks. An overview is provided in the file “Optimal
Control in Sustainable policymaking”. Your investigation could seek to recreate the results
and/or analysis in the paper and explain its conclusions, possibly looking into sensitivity of
the conclusions to the parameters in the model, or to make the paper more accessible to a
non-expert audience.
5. Describe the methodology and the key messages in the paper C.M. Kellett, S.R. Weller, T.
Faulwasser, L. Grune, W. Semmler, “Feedback, dynamics, and optimal control in climate
economics”, Annual Reviews in Control 47 (2019) 7-20, doi:10.1016/j.arcontrol.2019.04.003.
1
This paper concerns the computation of the Social Cost of Carbon, i.e., the estimate of the
future cost to society of releasing an additional unit of emissions today. Its estimation is
influenced by early papers by William Nordhaus and is posed as the solution to an optimal
control problem in the context of integrated assessment modelling (modelling the interac-
tions between climate and the economy). It is a more complicated problem than we have
considered in the module, although many of the considered models are relatively simple,
and this paper considers the so-called DICE model originally proposed by William Nord-
haus. This paper provides an accessible overview of
integrated assessment modelling, the Social Cost of Carbon, and this specific model. Your
investigation could explore the effects of parameter changes in the model and, in particular,
the way in which how the future is valued impacts on the estimate of the Social Cost of
Carbon. You could also look to summarise the key messages from the paper. An overview
is provided in the file “Integrated Assessment Modelling and Climate Change Policy”.