Login / Signup

Offline Regularised Reinforcement Learning for Large Language Models Alignment.

Pierre Harvey RichemondYunhao TangDaniel GuoDaniele CalandrielloMohammad Gheshlaghi AzarRafael RafailovBernardo Ávila PiresEugene TarassovLucas SpangherWill EllsworthAliaksei SeverynJonathan MallinsonLior ShaniGil ShamirRishabh JoshiTianqi LiuRémi MunosBilal Piot
Published in: CoRR (2024)
Keyphrases