Login / Signup

Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel.

Yao-Hung Hubert TsaiShaojie BaiMakoto YamadaLouis-Philippe MorencyRuslan Salakhutdinov
Published in: EMNLP/IJCNLP (1) (2019)
Keyphrases