Policy gradient primal-dual mirror descent for constrained MDPs with large state spaces.
Dongsheng DingMihailo R. JovanovicPublished in: CDC (2022)
Keyphrases
- primal dual
- policy gradient
- state space
- reinforcement learning algorithms
- markov decision processes
- reinforcement learning
- linear programming
- average reward
- linear program
- partially observable markov decision processes
- convex optimization
- optimal policy
- dynamic programming
- convergence rate
- approximation algorithms
- reward function
- algorithm for linear programming
- function approximation
- heuristic search
- semidefinite programming
- action space
- markov chain
- gradient method
- finite state
- search space
- state action
- reinforcement learning methods
- markov decision process
- dynamical systems
- partially observable
- stochastic games
- approximation methods
- infinite horizon
- optimal control
- planning problems
- initial state
- control problems
- average cost
- machine learning
- single agent
- belief state
- model free
- long run
- variance reduction
- convergence speed
- learning algorithm