The instabilities of large learning rate training: a loss landscape view.
Lawrence WangStephen RobertsPublished in: CoRR (2023)
Keyphrases
- learning rate
- multilayer neural networks
- training algorithm
- training speed
- adaptive learning rate
- convergence rate
- learning algorithm
- rapid convergence
- feed forward neural networks
- convergence speed
- hidden layer
- error function
- multiple views
- weight vector
- step size
- training process
- neural network
- activation function
- supervised learning
- training set
- delta bar delta
- training samples
- boundary conditions
- multi layer perceptron
- pairwise
- training data