Login / Signup
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic.
Tianying Ji
Yu Luo
Fuchun Sun
Xianyuan Zhan
Jianwei Zhang
Huazhe Xu
Published in:
CoRR (2023)
Keyphrases
</>
actor critic
reinforcement learning
optimal control
policy gradient
temporal difference
gradient method
approximate dynamic programming
neuro fuzzy
reinforcement learning algorithms
function approximation
policy iteration
artificial neural networks
convergence rate
average reward