On the Training Instability of Shuffling SGD with Batch Normalization.
David X. WuChulhee YunSuvrit SraPublished in: CoRR (2023)
Keyphrases
- stochastic gradient descent
- batch mode
- online algorithms
- training phase
- preprocessing
- genetic algorithm
- lower bound
- training speed
- supervised learning
- online learning
- test set
- training algorithm
- batch learning
- computer software
- training dataset
- training examples
- least squares
- small number
- active learning
- feature extraction