Login / Signup
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost.
Noam Shazeer
Mitchell Stern
Published in:
ICML (2018)
Keyphrases
</>
learning rate
convergence rate
gaussian kernels
convergence speed
error function
uniform convergence
covering numbers
neural network
learning algorithm
delta bar delta