Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Intelligence
Project: Investigating
Reinforcement Learning
Overview
Within SIT215 you have been learning about a range of problems that can be solved using techniques from
artificial and computational intelligence. This study has included coverage of both models and algorithms
suitable for AI and CI solutions. A particular limitation of all of the solutions that we have considered is that
they are designed by hand, or rely on the problem being formulated as an optimisation task.
In this project you are going to explore an advanced technique for solving many interesting and challenging
real world problems. One in which an agent learns a solution to a problem through interaction with the
environment, and through perception of a reinforcement, or feedback signal. This field is called, naturally,
reinforcement learning (RL). RL can also be seen as an online method for solving Markov Decision Problems
– as opposed to the offline methods of policy iteration, value iteration or dynamic programming, presented in
lectures (in week 9 & 10).
This project will require you to undertake self-directed study and learning of RL solution methods, building
upon topics and content covered in the first 10 weeks of this course. While this might seem daunting (not
being told how to solve the problem), you’ve been practicing this approach throughout the unit in the groupbased
PBL tasks, and so this is your chance to demonstrate individually what you’ve learned about problem
solving methodology.
Learning Objectives
This project addresses ULO2 and ULO3 for this unit:
? Design and implement software artefacts to demonstrate effectiveness and efficiency of solutions for
intelligent systems development
? Apply theoretical concepts and models to explain and communicate the design of intelligent systems
Specifically, these are addressed through achievement of the following task-specific learning objectives:
? Demonstrate ability to work with and extend software systems and frameworks for RL
? Describe and model RL problems using specific concepts and models
? Implement, evaluate and analyse the performance of different solutions on a range of RL problems
? Effectively communicate the process and outcomes of your research and development project
Preparatory Learning Activities
In order to complete this assessment task you will need to have first developed an understanding of a range
of topics covered in this unit in weeks 1 to 10. Given the assessment deadline, this may require you to
complete independent study of these topics prior to their presentation in lectures. The topics that you will
need to be familiar with are:
? Bayesian AI (working with probabilistic representations of uncertainty)
? State Space Search (understanding state space representations of systems)
? Normative Decision Theory (definitions of rational action, utility, intertemporal utility,
payoff/reward)
? Markov Decision Problems (representing sequential decision problems for agents acting in complex
domains, reward processes and finite horizon decision problems, optimal policies)
? Dynamic Programming (optimal solutions to sequential decision problems under specified
constraints)
Ultimately you will be able to complete this assessment task without a sound theoretical grounding in each
of these areas. However, having some knowledge of these areas and understanding of how they inter-relate
will make it far easier to understand learning materials on reinforcement learning, and far easier to explain
and describe your investigations and outcomes in this project. Our advice is that you use this project as a
basis for further study of these underlying areas, to assist in integrating the knowledge covered in this unit
into a meaningful ‘whole’, which supports completing this assessment task.
Task Requirements
This project will require you to use the OpenAI Gym environment for experimenting with reinforcement
learning tasks.
To complete this project, you need to complete the following requirements and sub-tasks.
Write a brief report (2-3 pages at most) on the Taxi problem, including a mathematical description of
the reinforcement learning problem and the Q-learning algorithm for its solution. To do this, you may
want to refer to a good textbook on reinforcement learning. A good starting point is the “bible” of RL:
“Reinforcement Learning: An Introduction”, by Sutton & Barto. You can find this book online as a
free PDF download. There’s even a 2
nd edition draft completed just this year. In your report you
should contrast the quality of solution of a random policy versus the “optimal” policy obtained by Qlearning.