Policy Regularization with Dataset Constraint for Offline Reinforcement Learning.
Yuhang RanYi-Chen LiFuxiang ZhangZongzhang ZhangYang YuPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- action selection
- markov decision processes
- real time
- policy gradient
- state space
- action space
- function approximation
- markov decision process
- partially observable
- reinforcement learning algorithms
- control policies
- markov decision problems
- function approximators
- policy iteration
- partially observable domains
- policy evaluation
- decision problems
- state and action spaces
- reward function
- state dependent
- control policy
- reinforcement learning problems
- state action
- actor critic
- continuous state
- machine learning
- average reward
- temporal difference
- dynamic programming
- learning process
- optimal control
- prior information
- benchmark datasets
- sufficient conditions
- multi agent
- agent receives