Login / Signup
Making Asynchronous Stochastic Gradient Descent Work for Transformers.
Alham Fikri Aji
Kenneth Heafield
Published in:
CoRR (2019)
Keyphrases
</>
stochastic gradient descent
least squares
matrix factorization
loss function
step size
random forests
regularization parameter
weight vector
machine learning
feature selection
support vector machine
convergence rate
multiple kernel learning
online algorithms