Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning.
Yang LiuMarius HofertPublished in: ICRA (2024)
Keyphrases
- reinforcement learning
- optimal policy
- model free
- policy search
- markov decision process
- action selection
- optimization algorithm
- function approximation
- partially observable environments
- markov decision processes
- optimization process
- machine learning
- reinforcement learning algorithms
- state space
- reinforcement learning problems
- actor critic
- policy gradient
- function approximators
- real time
- partially observable domains
- control problems
- temporal difference
- global optimization
- optimization method
- action space
- rl algorithms
- control policies
- markov decision problems
- reward function
- transition model
- agent learns
- multi agent systems
- learning algorithm