Login / Signup
Making Asynchronous Stochastic Gradient Descent Work for Transformers.
Alham Fikri Aji
Kenneth Heafield
Published in:
NGT@EMNLP-IJCNLP (2019)
Keyphrases
</>
stochastic gradient descent
least squares
step size
matrix factorization
loss function
support vector machine
random forests
weight vector
image processing
pairwise
knn
principal component analysis
regularization parameter