Login / Signup

A Survey on Data Selection for Language Models.

Alon AlbalakYanai ElazarSang Michael XieShayne LongpreNathan LambertXinyi WangNiklas MuennighoffBairu HouLiangming PanHaewon JeongColin RaffelShiyu ChangTatsunori HashimotoWilliam Yang Wang
Published in: CoRR (2024)
Keyphrases
  • language model
  • language modeling
  • mixture model
  • training data
  • n gram
  • information retrieval
  • hidden markov models
  • statistical language models