Learning Guidance Rewards with Trajectory-space Smoothing.

Tanmay Gangwani Yuan Zhou Jian Peng

Published in: NeurIPS (2020)

Keyphrases

reinforcement learning
learning process
online learning
learning tasks
prior knowledge
learning algorithm
learning systems
supervised learning
empirical studies
background knowledge
bandit problems
machine learning
space time
learning problems
learning community
multi armed bandits