A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies.
Huizhen YuPublished in: UAI (2005)
Keyphrases
- function approximation
- policy gradient
- model free reinforcement learning
- policy gradient methods
- reinforcement learning
- partially observable markov decision processes
- policy search
- actor critic
- optimal policy
- policy evaluation
- function approximators
- temporal difference
- continuous state
- temporal difference learning
- finite state
- reinforcement learning algorithms
- markov decision problems
- markov decision process
- model free
- learning tasks
- state space
- radial basis function
- decision problems
- belief state
- importance sampling
- multi agent
- control problems
- transfer learning
- variance reduction
- reinforcement learning methods
- reward function
- markov decision processes
- dynamical systems
- sufficient conditions
- supervised learning
- dynamic programming