Login / Signup
WARP: On the Benefits of Weight Averaged Rewarded Policies.
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
Published in:
CoRR (2024)
Keyphrases
</>
optimal policy
data mining
search algorithm
weighting scheme
control policies
neural network
artificial intelligence
reinforcement learning
video sequences
weight function