Fine-Tuning Language Models with Reward Learning on Policy.
Hao LangFei HuangYongbin LiPublished in: NAACL-HLT (2024)
Keyphrases
- language model
- fine tuning
- probabilistic model
- information retrieval
- reinforcement learning
- speech recognition
- language modeling
- viable alternative
- co occurrence
- smoothing methods
- n gram
- document retrieval
- partially observable environments
- statistical language models
- language modelling
- fine tuned
- query expansion
- information retrieval systems
- active learning
- image retrieval
- feature selection