Login / Signup
Addressing Sample Inefficiency and Reward Bias in Inverse Reinforcement Learning.
Ilya Kostrikov
Kumar Krishna Agrawal
Sergey Levine
Jonathan Tompson
Published in:
CoRR (2018)
Keyphrases
</>
inverse reinforcement learning
bayesian nonparametric
partially observable environments
reward function
preference elicitation
optimal policy
sample size
bayesian networks
reinforcement learning
markov decision processes
temporal difference
reinforcement learning algorithms
partially observable