Sign in

Mixhead: Breaking the low-rank bottleneck in multi-head attention language models.

Zhong ZhangNian ShaoChongming GaoRui MiaoQinli YangJunming Shao
Published in: Knowl. Based Syst. (2022)
Keyphrases