Constrained Policy Optimization with Explicit Behavior Density For Offline Reinforcement Learning.
Jing ZhangChi ZhangWenjia WangBingyi JingPublished in: NeurIPS (2023)
Keyphrases
- reinforcement learning
- optimal policy
- partially observable environments
- policy search
- concave convex procedure
- selective perception
- function approximation
- action selection
- reinforcement learning algorithms
- optimization algorithm
- reinforcement learning problems
- approximate dynamic programming
- action space
- state space
- control policies
- partially observable
- real time
- real robot
- sufficient conditions
- lagrange multipliers
- function approximators
- optimization problems
- policy gradient
- reward function
- markov decision processes
- inverse reinforcement learning
- optimal control
- dynamic programming
- state and action spaces
- decision problems
- actor critic
- optimization process
- control policy
- model free