Login / Signup

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent.

William MerrillVivek RamanujanYoav GoldbergRoy SchwartzNoah A. Smith
Published in: EMNLP (1) (2021)
Keyphrases