Residual Q-Learning: Offline and Online Policy Customization without Value.
Chenran LiChen TangHaruki NishimuraJean MercatMasayoshi TomizukaWei ZhanPublished in: CoRR (2023)
Keyphrases
- optimal policy
- action selection
- reinforcement learning
- real time
- cooperative
- state space
- online learning
- multi agent
- state action
- decision making
- e learning
- dynamic programming
- multi agent reinforcement learning
- markov decision process
- infinite horizon
- long run
- function approximation
- website
- learning algorithm
- neural network