Sign in

Efficient large-scale language model training on GPU clusters using megatron-LM.

Deepak NarayananMohammad ShoeybiJared CasperPatrick LeGresleyMostofa PatwaryVijay KorthikantiDmitri VainbrandPrethvi KashinkuntiJulie BernauerBryan CatanzaroAmar PhanishayeeMatei Zaharia
Published in: SC (2021)
Keyphrases