How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization.
Pierluca D'OroWojciech JaskowskiPublished in: NeurIPS (2020)
Keyphrases
- policy gradient
- actor critic
- action selection
- optimization problems
- optimization algorithm
- gradient method
- action space
- state action
- global optimization
- reinforcement learning
- optimization process
- agent learns
- maximum a posteriori
- agent receives
- optimal policy
- joint action
- function approximators
- gradient estimation
- natural actor critic
- variance reduction
- recursive least squares
- fully unsupervised
- constrained optimization
- optimization methods
- optimization method
- evolutionary algorithm