Policy Gradient with Tree Search (PGTS) in Reinforcement Learning Evades Local Maxima.
Navdeep KumarPriyank AgrawalKfir Yehuda LevyShie MannorPublished in: Tiny Papers @ ICLR (2024)
Keyphrases
- policy gradient
- tree search
- reinforcement learning
- state space
- actor critic
- reinforcement learning algorithms
- search algorithm
- function approximation
- branch and bound
- constraint propagation
- optimal control
- search tree
- gradient method
- policy gradient methods
- mathematical programming
- path finding
- reinforcement learning methods
- model free
- state action
- approximation methods
- variance reduction
- multi agent
- temporal difference
- function approximators
- optimal policy
- optimal solution
- rl algorithms
- combinatorial optimization
- temporal difference learning
- partially observable markov decision processes
- heuristic search
- search space
- learning algorithm
- dynamic environments
- convergence rate
- multi agent systems
- average reward
- learning tasks
- dynamic programming
- orders of magnitude
- markov decision processes