Login / Signup
How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization.
Pierluca D'Oro
Wojciech Jaskowski
Published in:
CoRR (2020)
Keyphrases
</>
policy gradient
actor critic
action selection
reinforcement learning
least squares
variance reduction
optimization algorithm
global optimization
constrained optimization
agent learns
maximum likelihood
optimization problems
optimization process
state action
optimization method
policy search
genetic algorithm