Understanding the Generalization Benefits of Late Learning Rate Decay.
Yinuo RenChao MaLexing YingPublished in: AISTATS (2024)
Keyphrases
- learning rate
- learning algorithm
- convergence rate
- error function
- convergence speed
- hidden layer
- multilayer neural networks
- adaptive learning rate
- weight vector
- rapid convergence
- training algorithm
- bp neural network algorithm
- uniform convergence
- feature selection
- neural network
- convergence theorem
- activation function