On the use of the policy gradient and Hessian in inverse reinforcement learning.
Alberto Maria MetelliMatteo PirottaMarcello RestelliPublished in: Intelligenza Artificiale (2020)
Keyphrases
- inverse reinforcement learning
- policy gradient
- reward function
- reinforcement learning algorithms
- temporal difference
- reinforcement learning
- step size
- gradient method
- preference elicitation
- function approximation
- average reward
- optimal control
- markov decision processes
- state action
- partially observable markov decision processes
- convergence rate
- optimal policy
- single agent
- action space
- multiple agents
- state variables
- decision problems
- dynamic environments
- state space