Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search.
Arta SeifyMichael BuroPublished in: CoRR (2020)
Keyphrases
- single agent
- policy iteration
- monte carlo tree search
- reinforcement learning
- temporal difference
- markov decision processes
- multi agent
- temporal difference learning
- multiple agents
- model free
- monte carlo
- decision problems
- dynamic environments
- optimal policy
- multi agent systems
- fixed point
- reinforcement learning methods
- action space
- function approximation
- average reward
- path finding
- finite state
- least squares
- markov decision process
- evaluation function
- optimal control
- reinforcement learning algorithms
- state space
- lower bound
- linear programming
- step size