A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes.
Shalabh BhatnagarShishir KumarPublished in: IEEE Trans. Autom. Control. (2004)
Keyphrases
- policy iteration
- stochastic approximation
- markov decision processes
- average reward
- actor critic
- dynamic programming
- reinforcement learning
- learning algorithm
- optimal policy
- model free
- optimal solution
- np hard
- worst case
- mathematical model
- reinforcement learning algorithms
- computational complexity
- linear programming
- cost function
- objective function
- monte carlo
- temporal difference
- path finding