Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
• Explain Bayesian Model Inference
• Difference between value and reward
• Complete Max-Margin formulation how is the value calculated and what to do against Reward
ambiguity and teachers suboptimality.
• Difference between Q-learning and TD(0)
• apply the synchronous backup rule for iterative policy evaluation
• What needs to be defined for a POMPD.
• Assume a POMPD with 2 underlying states and 3 possible observations. What is the
probability for the states after an observation?
• Assume a POMPD with 3 underlying states and 3 possible actions. What is the probability for
State 3 after an action? There where 2 belief given (state 1 and state 2) 18 probabilities for
being in a state under the condition of the state bevor and the action what is chosen. (belief for
state 3 and 9 more conditional probabilities had to been calculated)
• Show that the temporal difference method TD(1) is equivalent to Monte Carlo sampling.
• Explain a linear state feedback controller
• How does the function calculation works with a Function Approximation like Tile coding?
• How is an eligibility trace calculated
• Backup Diagramm vor SARSA
• How could soft-max action selection change between greedy and soft?
• Write down the Bellman equation with and without Expectation operator
• Which policy should an agent follow, if it is given the optimal value?
• Explain QMDP
• explain exploitation und exploration. How is it used in RL