• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Zixiang ChenJunkai ZhangYiwen KouXiangning ChenCho-Jui HsiehQuanquan Gu
Published in: CoRR (2023)
Keyphrases
  • stochastic gradient descent
  • objective function
  • edge preserving
  • half quadratic
  • information content
  • lower bound
  • monte carlo