Sign in

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining.

Sang Michael XieHieu PhamXuanyi DongNan DuHanxiao LiuYifeng LuPercy LiangQuoc V. LeTengyu MaAdams Wei Yu
Published in: CoRR (2023)
Keyphrases
  • language model
  • test collection
  • document retrieval
  • retrieval model
  • statistical language models