A nonmonotone learning rate strategy for SGD training of deep neural networks.
Nitish Shirish KeskarGeorge SaonPublished in: ICASSP (2015)
Keyphrases
- learning rate
- backpropagation algorithm
- training algorithm
- training speed
- neural network
- adaptive learning rate
- hidden layer
- multilayer neural networks
- activation function
- feed forward neural networks
- learning algorithm
- convergence rate
- training process
- feedforward neural networks
- error function
- weight vector
- stochastic gradient descent
- back propagation
- fuzzy neural network
- multi layer perceptron
- rapid convergence
- convergence speed
- artificial neural networks
- delta bar delta
- training phase
- multilayer perceptron
- fuzzy logic
- genetic algorithm
- network architecture
- neural network model
- convergence theorem
- bp neural network algorithm
- machine learning