Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data.
Yu YangAaditya K. SinghMostafa ElhoushiAnas MahmoudKushal TirumalaFabian GloeckleBaptiste RozièreCarole-Jean WuAri S. MorcosNewsha ArdalaniPublished in: CoRR (2023)
Keyphrases