Advantage-Aware Policy Optimization for Offline Reinforcement Learning.
Yunpeng QingShunyu LiuJingyuan CongKaixuan ChenYihe ZhouMingli SongPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- optimal policy
- action selection
- policy search
- control policy
- markov decision process
- policy gradient
- global optimization
- state space
- function approximation
- markov decision processes
- policy iteration
- learning algorithm
- partially observable environments
- approximate dynamic programming
- actor critic
- function approximators
- policy evaluation
- reward function
- partially observable
- real time
- optimization process
- optimization problems
- reinforcement learning algorithms
- action space
- model free
- control policies
- continuous state
- optimization method
- supervised learning
- multi agent
- machine learning