Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization.
Kun LeiZhengmao HeChenhao LuKaizhe HuYang GaoHuazhe XuPublished in: CoRR (2023)
Keyphrases
- multi step
- reinforcement learning
- optimal policy
- policy search
- action selection
- lower bounding
- single step
- markov decision process
- optimization algorithm
- function approximation
- markov decision processes
- policy iteration
- td learning
- k nearest neighbor
- knn
- partially observable
- exploration exploitation tradeoff
- markov decision problems
- reinforcement learning algorithms
- tumor classification
- reward function
- global optimization
- state space
- multi objective
- feature extraction
- temporal difference
- model free
- combinatorial optimization
- average reward
- dynamic programming
- policy evaluation
- learning process
- learning algorithm
- data sets