On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations.

Tim G. J. Rudner Cong Lu Michael A. Osborne Yarin Gal Yee Whye Teh

Published in: NeurIPS (2021)

Keyphrases

reinforcement learning
function approximation
markov decision processes
state space
computer assisted
optimal policy
supervised learning
expert knowledge
risk minimization
kullback leibler
model free
robotic control
temporal difference learning
optimal control
human experts
objective function
learning algorithm
learning classifier systems
least squares
action selection
temporal difference
action space
function approximators
support vector
dynamic programming
domain experts