A Survey on Data Selection for Language Models.
Alon AlbalakYanai ElazarSang Michael XieShayne LongpreNathan LambertXinyi WangNiklas MuennighoffBairu HouLiangming PanHaewon JeongColin RaffelShiyu ChangTatsunori HashimotoWilliam Yang WangPublished in: CoRR (2024)