Towards painless policy optimization for constrained MDPs.
Arushi JainSharan VaswaniReza BabanezhadCsaba SzepesváariDoina PrecupPublished in: UAI (2022)
Keyphrases
- optimal policy
- markov decision processes
- markov decision process
- concave convex procedure
- reinforcement learning
- state space
- markov decision problems
- finite horizon
- global optimization
- policy iteration
- average reward
- action space
- infinite horizon
- partially observable
- state and action spaces
- decision processes
- average cost
- optimization method
- optimization problems
- reward function
- optimization process
- optimization algorithm
- sufficient conditions
- initial state
- control policies
- reinforcement learning problems
- dynamic programming
- evolutionary algorithm