Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias.
Max Sobol MarkArchit SharmaFahim TajwarRafael RafailovSergey LevineChelsea FinnPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- online learning
- learning process
- learning algorithm
- action selection
- exploration exploitation tradeoff
- autonomous learning
- supervised learning
- learning systems
- real time
- passive aggressive
- knowledge acquisition
- learning problems
- learning classifier systems
- learned knowledge
- temporal difference learning
- online environment
- active learning
- inductive bias
- inverse reinforcement learning
- reinforcement learning problems