Login / Signup

SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training.

Nan HeWeichen XiongHanwen LiuYi LiaoLei DingKai ZhangGuohua TangXiao HanWei Yang
Published in: CoRR (2024)
Keyphrases
  • language model
  • prior knowledge
  • probabilistic model
  • information retrieval
  • prior information
  • training data
  • supervised learning
  • generative model
  • error rate
  • n gram
  • test collection
  • statistical model
  • smoothing methods