A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions.
Arnulf JentzenAdrian RiekertPublished in: CoRR (2021)
Keyphrases
- optimization method
- piecewise linear
- neural network training
- neural network
- quasi newton
- genetic algorithm
- optimization algorithm
- training process
- optimization methods
- optimization process
- training algorithm
- evolutionary algorithm
- chaotic map
- global optimum
- differential evolution
- particle swarm
- nonlinear optimization
- simulated annealing
- dynamic programming
- optimization procedure
- back propagation
- metaheuristic
- cost function
- pattern recognition
- convergence speed
- training samples
- hyperplane
- nelder mead simplex
- artificial neural networks
- training data
- regression algorithm
- data sets
- training set
- convergence rate
- feature extraction
- initial conditions
- optimization problems
- objective function