Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms.

Yanwei Jia Xun Yu Zhou

Published in: J. Mach. Learn. Res. (2022)

Keyphrases

actor critic
policy gradient
policy gradient methods
optimal control
reinforcement learning
learning algorithm
policy iteration
gradient method
temporal difference
approximate dynamic programming
neuro fuzzy
natural actor critic
partially observable markov decision processes
model free
markov decision processes
reinforcement learning algorithms
dynamical systems
function approximation
variance reduction
reinforcement learning methods
learning problems
state space
step size
average reward
rl algorithms
search space
convergence rate
multi agent systems