Supported Policy Optimization for Offline Reinforcement Learning.
Jialong WuHaixu WuZihan QiuJianmin WangMingsheng LongPublished in: NeurIPS (2022)
Keyphrases
- reinforcement learning
- optimal policy
- action selection
- policy search
- reinforcement learning algorithms
- optimization problems
- policy evaluation
- global optimization
- reinforcement learning problems
- markov decision process
- control policies
- optimization algorithm
- policy gradient
- temporal difference
- decision problems
- real time
- function approximation
- optimization method
- policy iteration
- control policy
- optimal control
- markov decision processes
- state space
- state and action spaces
- learning algorithm
- partially observable environments
- actor critic
- reinforcement learning methods
- model free
- action space
- partially observable
- reward function
- infinite horizon
- constrained optimization
- optimization process
- optimization methods
- dynamic programming
- machine learning