Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation.
Thibault CordierTanguy UrvoyLina Maria Rojas-BarahonaFabrice LefèvrePublished in: CoRR (2020)
Keyphrases
- inverse reinforcement learning
- control policies
- optimal policy
- human experts
- state dependent
- genetic algorithm
- model free reinforcement learning
- natural language
- dialogue system
- stochastic model
- expert knowledge
- mixed initiative
- man machine
- tour guide robot
- language generation
- dialogue management
- conversational agent
- speech acts
- learning automata
- domain experts