Policy-Gradients for PSRs and POMDPs.
Douglas AberdeenOlivier BuffetOwen ThomasPublished in: AISTATS (2007)
Keyphrases
- partially observable
- partially observable markov decision processes
- predictive state representations
- dynamical systems
- reinforcement learning
- markov decision processes
- decision problems
- optimal policy
- belief state
- state space
- markov decision problems
- infinite horizon
- partial observability
- policy search
- finite state
- partially observable markov decision process
- dynamic programming
- belief space
- reward function
- continuous state
- policy gradient
- policy gradient methods
- stochastic systems
- temporal difference
- average reward
- markov decision process
- action selection
- markov chain
- machine learning
- decision processes
- finite horizon
- approximate solutions
- dec pomdps
- optimal solution
- multi agent
- past observations