Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning.
Hanlin ZhuParia RashidinejadJiantao JiaoPublished in: CoRR (2023)
Keyphrases
- actor critic
- reinforcement learning
- optimal control
- approximate dynamic programming
- temporal difference
- average reward
- dynamic programming
- policy gradient
- function approximation
- reinforcement learning algorithms
- neuro fuzzy
- state space
- learning algorithm
- gradient method
- policy gradient methods
- model free
- multi agent
- optimal policy
- control problems
- optimal solution
- supervised learning
- machine learning
- learning tasks
- control policy
- policy iteration
- markov decision processes
- multi agent systems
- reinforcement learning methods
- rl algorithms
- mathematical model
- transfer learning
- natural actor critic