Fast Reinforcement Learning of Dialogue Policies Using Stable Function Approximation.
Matthias DeneckeKohji DohsakaMikio NakanoPublished in: IJCNLP (2004)
Keyphrases
- function approximation
- reinforcement learning
- optimal policy
- control policies
- temporal difference
- function approximators
- policy search
- tile coding
- model free
- markov decision process
- reinforcement learning algorithms
- temporal difference learning algorithms
- partially observable markov decision processes
- temporal difference learning
- mountain car
- state action space
- markov decision problems
- td methods
- reward function
- markov decision processes
- state space
- learning algorithm
- learning tasks
- machine learning
- long run
- control policy
- supervised learning
- dynamic programming
- neural network
- average reward
- temporal difference methods
- radial basis function
- data mining