Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning.
Lorenzo NociAlexandru MeterezThomas HofmannAntonio OrvietoPublished in: CoRR (2024)
Keyphrases
- deep learning
- learning rate
- unsupervised learning
- convergence rate
- learning algorithm
- machine learning
- optimization algorithm
- gaussian kernels
- weakly supervised
- unsupervised feature learning
- global optimization
- mental models
- uniform convergence
- active learning
- hidden layer
- semi supervised
- supervised learning
- domain specific
- information extraction
- pairwise
- convergence speed
- object recognition
- feature extraction
- decision trees