Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks.
Ziwei JiMatus TelgarskyPublished in: ICLR (2020)
Keyphrases
- error function
- artificial intelligence
- social networks
- cost function
- error rate
- machine learning
- small number
- test cases
- network size
- neural network
- learning rules
- statistical significance
- learning rate
- error bounds
- test data
- complex systems
- natural language processing
- objective function
- information systems
- learning algorithm
- information retrieval