Login / Signup
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates.
Dennis J. N. J. Soemers
Éric Piette
Matthew Stephenson
Cameron Browne
Published in:
CoG (2019)
Keyphrases
</>
learning algorithm
optimal policy
learning systems
action selection
learning process
active learning
online learning
learning tasks
reinforcement learning
prior knowledge
supervised learning
policy search