Login / Signup
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
James Demmel
Kurt Keutzer
Cho-Jui Hsieh
Published in:
ICLR (2020)
Keyphrases
</>
deep learning
deep architectures
restricted boltzmann machine
unsupervised learning
unsupervised feature learning
machine learning
deep belief networks
training set
supervised learning
mental models
domain specific
data sets
learning algorithm
higher order
weakly supervised