An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions.
Yao MaTingting ZhaoKohei HatanoMasashi SugiyamaPublished in: ECML/PKDD (2) (2014)
Keyphrases
- markov decision processes
- dynamic programming
- average reward
- computational complexity
- learning algorithm
- cost function
- objective function
- np hard
- state space
- linear programming
- action space
- monte carlo
- decision theoretic planning
- partially observable
- convergence rate
- search space
- reinforcement learning
- real valued
- long run
- single agent
- policy iteration
- optimal solution
- policy gradient