Sign in

Inverse reinforcement learning using Dynamic Policy Programming.

Eiji UchibeKenji Doya
Published in: ICDL-EPIROB (2014)
Keyphrases
  • inverse reinforcement learning
  • partially observable environments
  • bayesian nonparametric
  • preference elicitation
  • reward function
  • optimal policy
  • temporal difference