Entropy-SGD: Biasing Gradient Descent Into Wide Valleys.
Pratik ChaudhariAnna ChoromanskaStefano SoattoYann LeCunCarlo BaldassiChristian BorgsJennifer T. ChayesLevent SagunRiccardo ZecchinaPublished in: ICLR (Poster) (2017)
Keyphrases
- stochastic gradient descent
- loss function
- information theory
- mutual information
- information theoretic
- cost function
- least squares
- update rule
- information entropy
- wide range
- alternating least squares
- stochastic gradient
- step size
- objective function
- information content
- regularization parameter
- support vector machine
- matrix factorization
- entropy measure
- database
- shannon entropy
- conditional entropy
- data sets
- neural network
- learning algorithm
- feature selection
- error function
- random forests
- decision making
- decision trees
- multiscale