AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression.

Siyue WuHongzhan ChenXiaojun QuanQifan WangRui Wang
Published in: ACL (1) (2023)
Keyphrases