Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods.
Taiji SuzukiShunta AkiyamaPublished in: ICLR (2021)
Keyphrases
- kernel methods
- deep learning
- machine learning
- positive semidefinite
- kernel function
- kernel matrix
- feature space
- support vector
- unsupervised learning
- support vector machine
- convex optimization
- objective function
- pattern recognition
- multiple kernel learning
- kernel learning
- mental models
- decision trees
- weakly supervised
- reproducing kernel hilbert space
- loss function
- learning algorithm