Login / Signup

A Case for Validation Buffer in Pessimistic Actor-Critic.

Michal NaumanMateusz OstaszewskiMarek Cygan
Published in: CoRR (2024)
Keyphrases
  • actor critic
  • neural network
  • multi agent
  • function approximation
  • temporal difference
  • gradient method
  • policy gradient