Login / Signup
SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training.
Nan He
Weichen Xiong
Hanwen Liu
Yi Liao
Lei Ding
Kai Zhang
Guohua Tang
Xiao Han
Wei Yang
Published in:
CoRR (2024)
Keyphrases
</>
language model
prior knowledge
probabilistic model
information retrieval
prior information
training data
supervised learning
generative model
error rate
n gram
test collection
statistical model
smoothing methods