Login / Signup
Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks.
Ziwei Ji
Matus Telgarsky
Published in:
CoRR (2019)
Keyphrases
</>
error function
cost function
small number
error rate
social networks
knowledge base
objective function
heterogeneous networks
sample size
computer networks
error bounds
conjugate gradient
artificial intelligence
network design
community detection
loss function
training set
image segmentation