Login / Signup

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.

Luca SoldainiRodney KinneyAkshita BhagiaDustin SchwenkDavid AtkinsonRussell AuthurBen BoginKhyathi ChanduJennifer DumasYanai ElazarValentin HofmannAnanya Harsh JhaSachin KumarLi LucyXinxi LyuNathan LambertIan MagnussonJacob MorrisonNiklas MuennighoffAakanksha NaikCrystal NamMatthew E. PetersAbhilasha RavichanderKyle RichardsonZejiang ShenEmma StrubellNishant SubramaniOyvind TafjordPete WalshLuke ZettlemoyerNoah A. SmithHannaneh HajishirziIz BeltagyDirk GroeneveldJesse DodgeKyle Lo
Published in: CoRR (2024)
Keyphrases