Login / Signup
Why (and When) does Local SGD Generalize Better than SGD?
Xinran Gu
Kaifeng Lyu
Longbo Huang
Sanjeev Arora
Published in:
CoRR (2023)
Keyphrases
</>
stochastic gradient descent
least squares
matrix factorization
decision trees
multiscale
multi class
loss function