Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks.
Shrihari VasudevanPublished in: Entropy (2020)
Keyphrases
- learning rate
- stochastic gradient descent
- weight vector
- training speed
- neural network
- training algorithm
- activation function
- hidden layer
- convergence rate
- step size
- adaptive learning rate
- learning algorithm
- early stopping
- loss function
- least squares
- convergence speed
- training process
- online algorithms
- multilayer neural networks
- back propagation
- matrix factorization
- support vector machine
- artificial neural networks
- multi class
- genetic algorithm
- regularization parameter
- small number