Login / Signup
A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models.
Hayeon Lee
Rui Hou
Jongpil Kim
Davis Liang
Sung Ju Hwang
Alexander Min
Published in:
ACL (Findings) (2023)
Keyphrases
</>
language model
language modeling
pre trained
information retrieval
probabilistic model
n gram
document retrieval
statistical language models
viewpoint
language modelling
learning process
image classification
query expansion
retrieval model