Login / Signup
Why Transformers Need Adam: A Hessian Perspective.
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhi-Quan Luo
Published in:
CoRR (2024)
Keyphrases
</>
image sequences
data sets
viewpoint
multiresolution
neural network
decision making
probability distribution
hessian matrix