Login / Signup
Improving Generalization Performance by Switching from Adam to SGD.
Nitish Shirish Keskar
Richard Socher
Published in:
CoRR (2017)
Keyphrases
</>
stochastic gradient descent
data sets
image sequences
feature extraction
lower bound
expert systems