Convergent Policy Optimization for Safe Reinforcement Learning.
Ming YuZhuoran YangMladen KolarZhaoran WangPublished in: NeurIPS (2019)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- action selection
- function approximation
- partially observable environments
- optimization algorithm
- markov decision processes
- function approximators
- reinforcement learning problems
- markov decision process
- optimization problems
- action space
- finite state
- state space
- transition model
- approximate dynamic programming
- reinforcement learning algorithms
- learning algorithm
- learning process
- dynamic programming
- global optimization
- model free
- state action
- neural network
- inverse reinforcement learning
- machine learning
- least squares
- multi objective
- policy evaluation
- policy gradient
- continuous state
- markov decision problems
- average reward
- control policy
- optimization method
- learning problems
- policy iteration
- partially observable
- optimization process