Entropy-SGD: Biasing Gradient Descent Into Wide Valleys.
Pratik ChaudhariAnna ChoromanskaStefano SoattoYann LeCunCarlo BaldassiChristian BorgsJennifer T. ChayesLevent SagunRiccardo ZecchinaPublished in: CoRR (2016)
Keyphrases
- stochastic gradient descent
- cost function
- loss function
- information theoretic
- wide range
- information theory
- mutual information
- objective function
- alternating least squares
- update rule
- stochastic gradient
- information entropy
- least squares
- fuzzy entropy
- step size
- shannon entropy
- information content
- action selection
- image processing
- active contours
- feature space
- multiscale