A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation.
Akhilesh GotmareNitish Shirish KeskarCaiming XiongRichard SocherPublished in: CoRR (2018)
Keyphrases
- learning rate
- deep learning
- learning algorithm
- convergence rate
- unsupervised learning
- rapid convergence
- unsupervised feature learning
- convergence speed
- hidden layer
- machine learning
- deep architectures
- adaptive learning rate
- weakly supervised
- mental models
- multi class
- natural language processing
- supervised learning
- reinforcement learning
- multiscale