Login / Signup
Why (and When) does Local SGD Generalize Better than SGD?
Xinran Gu
Kaifeng Lyu
Longbo Huang
Sanjeev Arora
Published in:
ICLR (2023)
Keyphrases
</>
stochastic gradient descent
least squares
neural network
pairwise
special case
data sets
information retrieval
genetic algorithm
computer vision
multiscale
objective function
probabilistic model
loss function