Login / Signup

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Zixiang ChenJunkai ZhangYiwen KouXiangning ChenCho-Jui HsiehQuanquan Gu
Published in: CoRR (2023)
Keyphrases
  • stochastic gradient descent
  • objective function
  • edge preserving
  • half quadratic
  • information content
  • lower bound
  • monte carlo