Sign in

Small-scale proxies for large-scale Transformer training instabilities.

Mitchell WortsmanPeter J. LiuLechao XiaoKatie EverettAlex AlemiBen AdlamJohn D. Co-ReyesIzzeddin GurAbhishek KumarRoman NovakJeffrey PenningtonJascha Sohl-DicksteinKelvin XuJaehoon LeeJustin GilmerSimon Kornblith
Published in: CoRR (2023)
Keyphrases