Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD?

Tianbao Yang Yan Yan Zhuoning Yuan Rong Jin

Published in: CoRR (2018)

Keyphrases

stochastic gradient descent
test set
error rate
loss function
testing phase
training stage
supervised learning
training set
training process
test cases
error bounds
convergence speed
training samples
iterative algorithms
test data
convergence rate
particle swarm optimization
least squares
training error
training speed
neural network