Login / Signup

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates.

Dennis J. N. J. SoemersÉric PietteMatthew StephensonCameron Browne
Published in: CoG (2019)
Keyphrases
  • learning algorithm
  • optimal policy
  • learning systems
  • action selection
  • learning process
  • active learning
  • online learning
  • learning tasks
  • reinforcement learning
  • prior knowledge
  • supervised learning
  • policy search