Publication: A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay.