Gradient descent provably escapes saddle points in the training of shallow ReLU networks.

Patrick Cheridito Arnulf Jentzen Florian Rossmannek

Published in: CoRR (2022)

Keyphrases

saddle points
scale space
saddle point
objective function
training set
worst case
critical points
genetic algorithm
training samples
image processing
information extraction
structured prediction