The Virtues of Pessimism in Inverse Reinforcement Learning.

David Wu Gokul Swamy J. Andrew Bagnell Zhiwei Steven Wu Sanjiban Choudhury

Published in: CoRR (2024)

Keyphrases

inverse reinforcement learning
bayesian nonparametric
partially observable environments
preference elicitation
reward function
temporal difference
case based reasoning
control system