Reward estimation for dialogue policy optimisation.
Pei-Hao SuMilica GasicSteve J. YoungPublished in: Comput. Speech Lang. (2018)
Keyphrases
- partially observable environments
- expected reward
- optimal policy
- reward function
- inverse reinforcement learning
- reinforcement learning
- accurate estimation
- human machine
- human computer
- dialogue system
- multi agent
- asymptotically optimal
- markov decision process
- mixed initiative
- long run
- parameter estimation
- dialogue management
- markov chain
- state space