Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning.
Hanlin ZhuParia RashidinejadJiantao JiaoPublished in: NeurIPS (2023)
Keyphrases
- actor critic
- reinforcement learning
- optimal control
- approximate dynamic programming
- temporal difference
- policy gradient
- average reward
- neuro fuzzy
- function approximation
- reinforcement learning algorithms
- policy iteration
- dynamic programming
- control problems
- markov decision processes
- control policy
- state space
- optimal policy
- policy gradient methods
- model free
- linear program
- supervised learning
- action selection
- mathematical model
- sufficient conditions
- gradient method
- multi agent
- machine learning
- neural network