Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization.
Kun LeiZhengmao HeChenhao LuKaizhe HuYang GaoHuazhe XuPublished in: ICLR (2024)
Keyphrases
- multi step
- reinforcement learning
- optimal policy
- lower bounding
- policy search
- action selection
- markov decision process
- function approximation
- optimization algorithm
- single step
- actor critic
- function approximators
- dynamic programming
- global optimization
- optimal control
- policy gradient
- partially observable
- markov decision problems
- state space
- td learning
- tumor classification
- average reward
- reinforcement learning algorithms
- e learning
- reward function
- model free
- markov decision processes