Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD?
Tianbao YangYan YanZhuoning YuanRong JinPublished in: CoRR (2018)
Keyphrases
- stochastic gradient descent
- test set
- error rate
- loss function
- testing phase
- training stage
- supervised learning
- training set
- training process
- test cases
- error bounds
- convergence speed
- training samples
- iterative algorithms
- test data
- convergence rate
- particle swarm optimization
- least squares
- training error
- training speed
- neural network