Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation.
Xinting HuangJianzhong QiYu SunRui ZhangPublished in: ACL (2020)
Keyphrases
- semi supervised
- reinforcement learning
- learning algorithm
- learning process
- supervised learning
- partially observable environments
- dynamic programming
- semi supervised learning
- model free reinforcement learning
- policy gradient
- action selection
- monte carlo
- unsupervised learning
- prior knowledge
- background knowledge
- online learning
- weakly supervised
- state space
- inverse reinforcement learning
- natural language