Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection.
Taku KatoTakahiro ShinozakiPublished in: CoRR (2017)
Keyphrases
- policy gradient
- reinforcement learning
- actor critic
- function approximation
- reinforcement learning algorithms
- policy search
- policy gradient methods
- optimal control
- gradient method
- model free reinforcement learning
- state space
- function approximators
- variance reduction
- learning algorithm
- reinforcement learning methods
- partially observable markov decision processes
- policy iteration
- optimal policy
- average reward
- multi agent
- temporal difference learning
- model free
- approximation methods
- action space
- single agent
- supervised learning
- control problems
- multi agent systems
- sparse representation
- temporal difference