Batch Normalization Is Blind to the First and Second Derivatives of the Loss.

Zhanpeng Zhou Wen Shen Huixin Chen Ling Tang Quanshi Zhang

Published in: CoRR (2022)

Keyphrases

higher order
preprocessing
machine learning
steady state
real world
data mining
video sequences
worst case
information loss
critical points
batch mode
normalization method
batch learning
quasi invariant