Offline Regularised Reinforcement Learning for Large Language Models Alignment.
Pierre Harvey RichemondYunhao TangDaniel GuoDaniele CalandrielloMohammad Gheshlaghi AzarRafael RafailovBernardo Ávila PiresEugene TarassovLucas SpangherWill EllsworthAliaksei SeverynJonathan MallinsonLior ShaniGil ShamirRishabh JoshiTianqi LiuRémi MunosBilal PiotPublished in: CoRR (2024)
Keyphrases
- language model
- reinforcement learning
- language modeling
- document retrieval
- language modelling
- speech recognition
- n gram
- query expansion
- probabilistic model
- test collection
- information retrieval
- ad hoc information retrieval
- retrieval model
- context sensitive
- statistical language models
- smoothing methods
- vector space model
- word error rate
- document ranking
- language models for information retrieval
- language model for information retrieval
- okapi bm
- relevance model
- pseudo relevance feedback
- query terms
- relevant documents
- web search
- learning algorithm