Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble.
Seunghyun LeeYounggyo SeoKimin LeePieter AbbeelJinwoo ShinPublished in: CoRL (2021)
Keyphrases
- reinforcement learning
- online learning
- real time
- learning algorithm
- machine learning
- training data
- function approximation
- neural network
- optimal policy
- learning process
- temporal difference learning
- random forests
- training set
- multi agent
- state space
- markov decision processes
- learning environment
- ensemble learning
- temporal difference
- genetic algorithm
- multi agent reinforcement learning