Login / Signup
Parameter Norm Growth During Training of Transformers.
William Merrill
Vivek Ramanujan
Yoav Goldberg
Roy Schwartz
Noah A. Smith
Published in:
CoRR (2020)
Keyphrases
</>
training set
training process
training phase
data sets
decision trees
test set
databases
information systems
multi agent systems
least squares
online learning
parameter values
linear model
training algorithm
optimal parameters