Mixhead: Breaking the low-rank bottleneck in multi-head attention language models.

Published in: Knowl. Based Syst. (2022)

Keyphrases