Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods.

Taiji Suzuki Shunta Akiyama

Published in: ICLR (2021)

Keyphrases

kernel methods
deep learning
machine learning
positive semidefinite
kernel function
kernel matrix
feature space
support vector
unsupervised learning
support vector machine
convex optimization
objective function
pattern recognition
multiple kernel learning
kernel learning
mental models
decision trees
weakly supervised
reproducing kernel hilbert space
loss function
learning algorithm