Login / Signup

Why Transformers Need Adam: A Hessian Perspective.

Yushun ZhangCongliang ChenTian DingZiniu LiRuoyu SunZhi-Quan Luo
Published in: CoRR (2024)
Keyphrases
  • image sequences
  • data sets
  • viewpoint
  • multiresolution
  • neural network
  • decision making
  • probability distribution
  • hessian matrix