Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods.
Taiji SuzukiShunta AkiyamaPublished in: CoRR (2020)
Keyphrases
- kernel methods
- deep learning
- machine learning
- kernel function
- unsupervised learning
- positive semidefinite
- kernel matrix
- support vector
- feature space
- support vector machine
- loss function
- convex optimization
- weakly supervised
- multiple kernel learning
- kernel learning
- objective function
- mental models
- convex sets
- image processing
- active learning
- multi class
- graph cuts
- image features
- data mining
- reproducing kernel hilbert space