Login / Signup
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes.
Rishabh Agarwal
Nino Vieillard
Yongchao Zhou
Piotr Stanczyk
Sabela Ramos Garea
Matthieu Geist
Olivier Bachem
Published in:
ICLR (2024)
Keyphrases
</>
language model
language modeling
probabilistic model
reinforcement learning
machine learning
speech recognition
recommender systems
error rate
n gram
machine translation
retrieval model
language modelling
statistical language models