Login / Signup

Multi-turn Reinforcement Learning from Preference Human Feedback.

Lior ShaniAviv RosenbergAsaf CasselOran LangDaniele CalandrielloAvital ZiporiHila NogaOrgad KellerBilal PiotIdan SzpektorAvinatan HassidimYossi MatiasRémi Munos
Published in: CoRR (2024)
Keyphrases