On the Training Instability of Shuffling SGD with Batch Normalization.

David X. Wu Chulhee Yun Suvrit Sra

Published in: CoRR (2023)

Keyphrases

stochastic gradient descent
batch mode
online algorithms
training phase
preprocessing
genetic algorithm
lower bound
training speed
supervised learning
online learning
test set
training algorithm
batch learning
computer software
training dataset
training examples
least squares
small number
active learning
feature extraction