Reinforcement Learning in POMDP's via Direct Gradient Ascent.
Jonathan BaxterPeter L. BartlettPublished in: ICML (2000)
Keyphrases
- gradient ascent
- reinforcement learning
- partially observable markov decision processes
- policy gradient
- function approximation
- state space
- optimal policy
- markov decision processes
- partially observable
- reinforcement learning algorithms
- cross entropy
- partially observable markov decision process
- markov decision process
- dynamic programming
- expectation maximization
- learning algorithm
- decision problems
- multi agent
- finite state
- average reward
- dynamical systems
- action selection
- control problems
- temporal difference
- model free
- belief state
- exponential family
- machine learning
- reward function
- long run
- state action
- rl algorithms