A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions.
Arnulf JentzenAdrian RiekertPublished in: J. Mach. Learn. Res. (2022)
Keyphrases
- optimization method
- piecewise linear
- neural network training
- neural network
- genetic algorithm
- quasi newton
- training process
- optimization methods
- optimization process
- optimization algorithm
- training algorithm
- differential evolution
- evolutionary algorithm
- simulated annealing
- dynamic programming
- particle swarm
- nelder mead simplex
- global optimum
- optimization procedure
- chaotic map
- regression algorithm
- metaheuristic
- pattern recognition
- cost function
- hyperplane
- back propagation
- convergence speed
- search algorithm
- nonlinear optimization
- optimal solution
- objective function
- convergence rate
- learning rate
- fitness function
- feature selection