Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning.
Jiajun FanHe BaXian GuoJianye HaoPublished in: CoRR (2020)
Keyphrases
- actor critic
- reinforcement learning
- policy gradient
- reinforcement learning algorithms
- temporal difference
- approximate dynamic programming
- optimal control
- reinforcement learning problems
- action selection
- action space
- gradient method
- neuro fuzzy
- policy iteration
- function approximation
- partially observable markov decision processes
- policy gradient methods
- state space
- partially observable
- dynamic programming
- single agent
- learning algorithm
- markov decision problems
- multi agent
- markov decision processes
- path finding
- natural actor critic
- model free
- heuristic search
- planning problems
- reinforcement learning methods
- optimal policy
- linear program
- evaluation function
- temporal difference learning
- variance reduction
- state action
- least squares
- average reward
- function approximators
- supervised learning
- control problems
- rl algorithms
- linear programming
- monte carlo
- adaptive control
- convergence speed
- learning problems
- machine learning