Residual Q-Learning: Offline and Online Policy Customization without Value.
Chenran LiChen TangHaruki NishimuraJean MercatMasayoshi TomizukaWei ZhanPublished in: NeurIPS (2023)
Keyphrases
- optimal policy
- real time
- action selection
- cooperative
- online learning
- reinforcement learning
- state space
- multi agent
- function approximation
- online environment
- model free
- reinforcement learning algorithms
- multi agent reinforcement learning
- continuous state spaces
- single agent
- markov decision process
- state dependent
- reinforcement learning methods