Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization.
Chenping HuangBin CaoPublished in: CollaborateCom (1) (2022)
Keyphrases
- learning algorithm
- learning process
- online learning
- partially observable environments
- action selection
- learning systems
- optimization problems
- learning problems
- supervised learning
- inverse reinforcement learning
- optimization algorithm
- learning community
- dialogue system
- human computer
- temporal difference learning
- policy gradient
- policy search
- reinforcement learning problems
- genetic algorithm