Login / Signup
Fine-Tuning Language Models with Reward Learning on Policy.
Hao Lang
Fei Huang
Yongbin Li
Published in:
CoRR (2024)
Keyphrases
</>
language model
fine tuning
language modeling
reinforcement learning
probabilistic model
inverse reinforcement learning
statistical language models
partially observable environments
active learning
document retrieval
context sensitive
retrieval model
smoothing methods