Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks.

Ziwei Ji Matus Telgarsky

Published in: CoRR (2019)

Keyphrases

error function
cost function
small number
error rate
social networks
knowledge base
objective function
heterogeneous networks
sample size
computer networks
error bounds
conjugate gradient
artificial intelligence
network design
community detection
loss function
training set
image segmentation