Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States.
Ziqiao WangYongyi MaoPublished in: CoRR (2022)
Keyphrases
- information theoretic
- mutual information
- information theory
- stochastic gradient descent
- theoretic framework
- jensen shannon divergence
- kullback leibler divergence
- training set
- multi modality
- information theoretic measures
- information bottleneck
- computational learning theory
- minimum description length
- log likelihood
- training samples
- relative entropy
- bregman divergences
- entropy measure
- machine learning
- image registration
- learning algorithm