Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm.
Shaoyi HuangDongkuan XuIan En-Hsu YenYijue WangSung-En ChangBingbing LiShiyang ChenMimi XieSanguthevar RajasekaranHang LiuCaiwen DingPublished in: ACL (1) (2022)