• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.

Luca SoldainiRodney KinneyAkshita BhagiaDustin SchwenkDavid AtkinsonRussell AuthurBen BoginKhyathi ChanduJennifer DumasYanai ElazarValentin HofmannAnanya Harsh JhaSachin KumarLi LucyXinxi LyuNathan LambertIan MagnussonJacob MorrisonNiklas MuennighoffAakanksha NaikCrystal NamMatthew E. PetersAbhilasha RavichanderKyle RichardsonZejiang ShenEmma StrubellNishant SubramaniOyvind TafjordPete WalshLuke ZettlemoyerNoah A. SmithHannaneh HajishirziIz BeltagyDirk GroeneveldJesse DodgeKyle Lo
Published in: CoRR (2024)
Keyphrases