Login / Signup
Toward Understanding Why Adam Converges Faster Than SGD for Transformers.
Yan Pan
Yuanzhi Li
Published in:
CoRR (2023)
Keyphrases
</>
database
worst case
neural network
computer vision
website
image sequences
multi agent
search algorithm
multiresolution
deeper understanding