AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression.

Siyue WuHongzhan ChenXiaojun QuanQifan WangRui Wang
Published in: CoRR (2023)
Keyphrases