Dynamic Dialogue Policy for Continual Reinforcement Learning.
Christian GeishauserCarel van NiekerkHsien-Chin LinNurul LubisMichael HeckShutong FengMilica GasicPublished in: COLING (2022)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- action selection
- partially observable environments
- action space
- function approximation
- learning process
- model free
- reward function
- markov decision process
- policy iteration
- function approximators
- control policy
- control policies
- policy gradient
- approximate dynamic programming
- actor critic
- dynamic environments
- reinforcement learning problems