Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization.
Jyun-Li LinWei HungShang-Hsuan YangPing-Chun HsiehXi LiuPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- action selection
- policy gradient
- action space
- optimal policy
- state action
- partially observable domains
- agent learns
- policy search
- concave convex procedure
- line search
- optimization problems
- markov decision processes
- reinforcement learning algorithms
- markov decision process
- optimization algorithm
- global search
- state space
- agent receives
- function approximation
- control policy
- temporal difference
- transition model
- saddle point
- actor critic
- partially observable environments
- reward shaping
- partially observable
- state and action spaces
- inverse reinforcement learning
- model free
- learning algorithm
- reward function
- policy iteration
- steepest descent method
- machine learning
- optimal control
- continuous state spaces
- continuous state
- control policies
- optimization methods
- quadratic programming