Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan NarangHyung Won ChungYi TayLiam FedusThibault FévryMichael MatenaKarishma MalkanNoah FiedelNoam ShazeerZhenzhong LanYanqi ZhouWei LiNan DingJake MarcusAdam RobertsColin RaffelPublished in: EMNLP (1) (2021)