Using Reinforcement Learning for Dialogue Management Policies: Towards Understanding MDP Violations and Convergence.
Peter A. HeemanJordan FryerRebecca LunsfordAndrew RueckertEthan SelfridgePublished in: INTERSPEECH (2012)
Keyphrases
- reinforcement learning
- management policies
- markov decision processes
- optimal policy
- markov decision process
- state space
- management system
- reinforcement learning algorithms
- dialogue manager
- reward function
- convergence rate
- stochastic approximation
- model free
- machine learning
- policy iteration
- policy search
- human machine
- partially observable
- information systems
- temporal difference
- function approximation
- dialogue system
- spoken dialogue systems
- supply chain
- dialogue management
- natural language
- decision making
- partially observable markov decision processes
- information management
- action space
- dynamic programming
- markov decision problems
- stationary policies
- multi agent
- stochastic shortest path