Login / Signup
SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training.
Nan He
Weichen Xiong
Hanwen Liu
Yi Liao
Lei Ding
Kai Zhang
Guohua Tang
Xiao Han
Yang Wei
Published in:
ACL (1) (2024)
Keyphrases
</>
language model
prior knowledge
probabilistic model
probability distribution
information retrieval
speech recognition
relevance model
language modeling
clustering method
n gram
prior information
generative model
translation model
statistical model
active learning
training set
training data