Login / Signup
Learning Guidance Rewards with Trajectory-space Smoothing.
Tanmay Gangwani
Yuan Zhou
Jian Peng
Published in:
NeurIPS (2020)
Keyphrases
</>
reinforcement learning
learning process
online learning
learning tasks
prior knowledge
learning algorithm
learning systems
supervised learning
empirical studies
background knowledge
bandit problems
machine learning
space time
learning problems
learning community
multi armed bandits