Sign in
Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits.
Wenshuo Guo
Kumar Krishna Agrawal
Aditya Grover
Vidya K. Muthukumar
Ashwin Pananjady
Published in:
AISTATS (2022)
Keyphrases
</>
reinforcement learning
learning algorithm
learning process
data sets
prior knowledge
learning systems
learning tasks
inductive inference
decision trees
active learning
dynamic programming
supervised learning
online learning
expectation maximization
stochastic systems