Login / Signup

Stabilizing Transformer Training by Preventing Attention Entropy Collapse.

Shuangfei ZhaiTatiana LikhomanenkoEtai LittwinDan BusbridgeJason RamapuramYizhe ZhangJiatao GuJoshua M. Susskind
Published in: CoRR (2023)
Keyphrases