Escaping from zero gradient: Revisiting action-constrained reinforcement learning via Frank-Wolfe policy optimization.
Jyun-Li LinWei HungShang-Hsuan YangPing-Chun HsiehXi LiuPublished in: UAI (2021)
Keyphrases
- reinforcement learning
- action selection
- policy gradient
- action space
- optimal policy
- state action
- partially observable domains
- concave convex procedure
- global optimization
- function approximation
- agent learns
- policy search
- optimization problems
- state space
- markov decision processes
- function approximators
- markov decision process
- partially observable environments
- state and action spaces
- reinforcement learning problems
- control policy
- actor critic
- global search
- machine learning
- transition model
- optimization algorithm
- saddle point
- policy evaluation
- control policies
- reinforcement learning algorithms
- simulated annealing
- partially observable
- policy iteration
- gradient method
- multi agent
- stochastic games
- average reward
- continuous state
- line search
- multi objective
- edge detection
- long run
- constrained optimization
- optimization methods
- learning algorithm
- steepest descent method
- reward signal