An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes.
Shalabh BhatnagarK. LakshmananPublished in: J. Optim. Theory Appl. (2012)
Keyphrases
- function approximation
- actor critic
- markov decision processes
- reinforcement learning
- policy iteration
- reinforcement learning algorithms
- average reward
- dynamic programming
- state space
- objective function
- model free
- learning algorithm
- policy evaluation
- temporal difference learning
- temporal difference
- optimal control
- policy gradient
- search space
- infinite horizon
- monte carlo
- cost function
- finite state
- long run
- radial basis function
- optimal policy
- linear programming
- gradient method
- active learning
- search algorithm
- neural network