Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection.

Taku Kato Takahiro Shinozaki

Published in: CoRR (2017)

Keyphrases

policy gradient
reinforcement learning
actor critic
function approximation
reinforcement learning algorithms
policy search
policy gradient methods
optimal control
gradient method
model free reinforcement learning
state space
function approximators
variance reduction
learning algorithm
reinforcement learning methods
partially observable markov decision processes
policy iteration
optimal policy
average reward
multi agent
temporal difference learning
model free
approximation methods
action space
single agent
supervised learning
control problems
multi agent systems
sparse representation
temporal difference