Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning.
Haoxuan PanDeheng YeXiaoming DuanQiang FuWei YangJianping HeMingfei SunPublished in: CoRR (2023)
Keyphrases
- dynamic programming
- optimal policy
- reinforcement learning
- policy search
- state space
- markov decision processes
- approximate dynamic programming
- infinite horizon
- optimal control
- reinforcement learning algorithms
- function approximation
- markov decision process
- partially observable markov decision processes
- partially observable
- actor critic
- control policy
- state dependent
- reward function
- action selection
- estimation algorithm
- machine learning
- reinforcement learning problems
- long run
- action space
- state and action spaces
- partially observable domains
- learning algorithm
- learning process
- deep learning
- rl algorithms
- policy gradient
- control problems
- image gradient
- estimation accuracy
- temporal difference
- transfer learning
- partially observable environments