Login / Signup

Toward Understanding Why Adam Converges Faster Than SGD for Transformers.

Yan PanYuanzhi Li
Published in: CoRR (2023)
Keyphrases
  • database
  • worst case
  • neural network
  • computer vision
  • website
  • image sequences
  • multi agent
  • search algorithm
  • multiresolution
  • deeper understanding