Publication: Combining stochastic average gradient and Hessian-free optimization for sequence training of deep neural networks.