Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation.
Xinting HuangJianzhong QiYu SunRui ZhangPublished in: CoRR (2020)
Keyphrases
- semi supervised
- reinforcement learning
- supervised learning
- active learning
- learning process
- learning algorithm
- partially observable environments
- multi view
- online learning
- inverse reinforcement learning
- learning systems
- optimal policy
- unsupervised learning
- prior knowledge
- natural language
- learning tasks
- model free reinforcement learning