Login / Signup

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion.

Yilong ChenLinhao ZhangJunyuan ShangZhenyu ZhangTingwen LiuShuohuan WangYu Sun
Published in: CoRR (2024)
Keyphrases