Login / Signup

Why (and When) does Local SGD Generalize Better than SGD?

Xinran GuKaifeng LyuLongbo HuangSanjeev Arora
Published in: CoRR (2023)
Keyphrases
  • stochastic gradient descent
  • least squares
  • matrix factorization
  • decision trees
  • multiscale
  • multi class
  • loss function