Supported Policy Optimization for Offline Reinforcement Learning.
Jialong WuHaixu WuZihan QiuJianmin WangMingsheng LongPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- optimal policy
- action selection
- partially observable environments
- markov decision process
- global optimization
- policy search
- optimization process
- function approximation
- approximate dynamic programming
- reinforcement learning problems
- control policy
- optimization problems
- markov decision processes
- dynamic programming
- policy gradient
- constrained optimization
- real time
- optimal control
- optimization methods
- actor critic
- least squares
- state and action spaces
- policy gradient methods
- control policies
- average reward
- partially observable
- combinatorial optimization
- decision problems
- optimization algorithm
- state space
- multi objective
- optimal solution
- machine learning