Login / Signup

DataComp-LM: In search of the next generation of training sets for language models.

Jeffrey LiAlex FangGeorgios SmyrnisMaor IvgiMatt JordanSamir Yitzhak GadreHritik BansalEtash GuhaSedrick KehKushal AroraSaurabh GargRui XinNiklas MuennighoffReinhard HeckelJean MercatMayee ChenSuchin GururanganMitchell WortsmanAlon AlbalakYonatan BittonMarianna NezhurinaAmro AbbasCheng-Yu HsiehDhruba GhoshJosh GardnerMaciej KilianHanlin ZhangRulin ShaoSarah M. PrattSunny SanyalGabriel IlharcoGiannis DarasKalyani MaratheAaron GokaslanJieyu ZhangKhyathi Raghavi ChanduThao NguyenIgor VasiljevicSham M. KakadeShuran SongSujay SanghaviFartash FaghriSewoong OhLuke ZettlemoyerKyle LoAlaaeldin El-NoubyHadi PouransariAlexander ToshevStephanie WangDirk GroeneveldLuca SoldainiPang Wei KohJenia JitsevThomas KollarAlexandros G. DimakisYair CarmonAchal DaveLudwig SchmidtVaishaal Shankar
Published in: CoRR (2024)
Keyphrases