Login / Signup
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates.
Dennis J. N. J. Soemers
Éric Piette
Matthew Stephenson
Cameron Browne
Published in:
CoRR (2019)
Keyphrases
</>
learning algorithm
learning systems
policy search
learning process
supervised learning
online learning
optimal policy
policy gradient methods
reinforcement learning
prior knowledge
active learning
knowledge acquisition
monte carlo
game playing
control policies