Mildly Conservative Q-Learning for Offline Reinforcement Learning.
Jiafei LyuXiaoteng MaXiu LiZongqing LuPublished in: NeurIPS (2022)
Keyphrases
- reinforcement learning
- function approximation
- reinforcement learning algorithms
- state space
- model free
- multi agent reinforcement learning
- stochastic approximation
- temporal difference learning
- temporal difference
- state action space
- continuous state and action spaces
- markov decision processes
- optimal policy
- optimal control
- action selection
- dynamic programming
- rl algorithms
- eligibility traces
- multi agent
- real time
- learning algorithm
- continuous state spaces
- reinforcement learning methods
- supervised learning
- state action
- partially observable
- state abstraction
- mobile robot
- learning process
- function approximators
- policy iteration
- cooperative
- continuous state
- control problems
- actor critic
- infinite horizon