Sign in
Inverse reinforcement learning using Dynamic Policy Programming.
Eiji Uchibe
Kenji Doya
Published in:
ICDL-EPIROB (2014)
Keyphrases
</>
inverse reinforcement learning
partially observable environments
bayesian nonparametric
preference elicitation
reward function
optimal policy
temporal difference