Training trajectories, mini-batch losses and the curious role of the learning rate.
Mark SandlerAndrey ZhmoginovMax VladymyrovNolan MillerPublished in: CoRR (2023)
Keyphrases
- learning rate
- multilayer neural networks
- training algorithm
- training speed
- adaptive learning rate
- learning algorithm
- convergence rate
- error function
- hidden layer
- batch mode
- rapid convergence
- convergence speed
- feed forward neural networks
- supervised learning
- genetic algorithm
- weight vector
- multi objective
- support vector machine
- convergence theorem
- neural network
- activation function
- training process
- delta bar delta
- linear programming
- evolutionary algorithm
- training set
- machine learning
- data mining