System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Sam Ade JacobsMasahiro TanakaChengming ZhangMinjia ZhangReza Yazdani AminabadiShuaiwen Leon SongSamyam RajbhandariYuxiong HePublished in: IPDPS (Workshops) (2024)