Login / Signup
A Case for Validation Buffer in Pessimistic Actor-Critic.
Michal Nauman
Mateusz Ostaszewski
Marek Cygan
Published in:
CoRR (2024)
Keyphrases
</>
actor critic
neural network
multi agent
function approximation
temporal difference
gradient method
policy gradient