Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog.
Ryuichi TakanobuHanlin ZhuMinlie HuangPublished in: CoRR (2019)
Keyphrases
- multi domain
- learning process
- reinforcement learning
- spoken dialog
- learning tasks
- partially observable environments
- learning algorithm
- natural language
- inverse reinforcement learning
- data sets
- supervised learning
- unsupervised learning
- text mining
- policy gradient
- search computing
- cross domain
- knowledge transfer
- prior knowledge
- recommender systems
- feature selection
- real world