Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks.

Ziwei Ji Matus Telgarsky

Published in: ICLR (2020)

Keyphrases

error function
artificial intelligence
social networks
cost function
error rate
machine learning
small number
test cases
network size
neural network
learning rules
statistical significance
learning rate
error bounds
test data
complex systems
natural language processing
objective function
information systems
learning algorithm
information retrieval