Login / Signup
Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits.
Wenshuo Guo
Kumar Krishna Agrawal
Aditya Grover
Vidya Muthukumar
Ashwin Pananjady
Published in:
CoRR (2021)
Keyphrases
</>
reinforcement learning
learning algorithm
online learning
learning systems
learning scheme
unsupervised learning
machine learning
learning process
estimation error
optimal solution
active learning
dynamic programming
learning tasks
learning capabilities
linear predictors