A Two-Timescale Simulation-Based Gradient Algorithm for Weighted Cost Markov Decision Processes.
Ying HeMichael C. FuSteven I. MarcusPublished in: CDC/ECC (2005)
Keyphrases
- markov decision processes
- dynamic programming
- model based reinforcement learning
- learning algorithm
- computational complexity
- objective function
- state space
- average reward
- optimal policy
- np hard
- search space
- reinforcement learning
- search algorithm
- state variables
- finite state
- average cost
- optimal solution
- gradient method