On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations.
Tim G. J. RudnerCong LuMichael A. OsborneYarin GalYee Whye TehPublished in: NeurIPS (2021)
Keyphrases
- reinforcement learning
- function approximation
- markov decision processes
- state space
- computer assisted
- optimal policy
- supervised learning
- expert knowledge
- risk minimization
- kullback leibler
- model free
- robotic control
- temporal difference learning
- optimal control
- human experts
- objective function
- learning algorithm
- learning classifier systems
- least squares
- action selection
- temporal difference
- action space
- function approximators
- support vector
- dynamic programming
- domain experts