An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions.
Yao MaTingting ZhaoKohei HatanoMasashi SugiyamaPublished in: Neural Comput. (2016)
Keyphrases
- markov decision processes
- dynamic programming
- learning algorithm
- average reward
- computational complexity
- policy iteration
- cost function
- real time dynamic programming
- decision theoretic planning
- state space
- np hard
- optimal policy
- search space
- action space
- actor critic
- state action
- convergence rate
- approximation methods
- partially observable
- reinforcement learning algorithms
- objective function