Login / Signup
A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models.
Hayeon Lee
Rui Hou
Jongpil Kim
Davis Liang
Sung Ju Hwang
Alexander Min
Published in:
CoRR (2023)
Keyphrases
</>
language model
language modeling
document retrieval
test collection
speech recognition
statistical language models
pre trained
smoothing methods
document ranking
retrieval model
n gram
viewpoint
query expansion
probabilistic model
principal component analysis
information extraction
prior knowledge
neural network