Login / Signup
Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
Zixiang Chen
Junkai Zhang
Yiwen Kou
Xiangning Chen
Cho-Jui Hsieh
Quanquan Gu
Published in:
CoRR (2023)
Keyphrases
</>
stochastic gradient descent
objective function
edge preserving
half quadratic
information content
lower bound
monte carlo