Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling.

Yuping Luo Huazhe Xu Tengyu Ma

Published in: CoRR (2019)

Keyphrases

learning algorithm
learning process
prior knowledge
online learning
unsupervised learning
learning systems
reinforcement learning
multi agent
monte carlo
positive and negative
incremental learning