Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning.
Md. Masudur RahmanYexiang XuePublished in: ICMLA (2022)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- state and action spaces
- markov decision process
- action selection
- action space
- markov decision processes
- optimization algorithm
- partially observable
- function approximation
- partially observable environments
- optimization process
- state space
- policy iteration
- model free
- global optimization
- temporal difference
- actor critic
- reinforcement learning algorithms
- learning problems
- optimization problems
- least squares
- control policy
- policy gradient
- multi agent
- learning algorithm