Login / Signup

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences.

Andi NikaDebmalya MandalParameswaran KamalarubanGeorgios TzannetosGoran RadanovicAdish Singla
Published in: CoRR (2024)
Keyphrases