Learning online alignments with continuous rewards policy gradient.

Yuping Luo Chung-Cheng Chiu Navdeep Jaitly Ilya Sutskever

Published in: ICASSP (2017)

Keyphrases

reinforcement learning
learning algorithm
policy gradient
learning tasks
learning process
cost function
decision problems
learning problems
function approximation