Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble.

Seunghyun Lee Younggyo Seo Kimin Lee Pieter Abbeel Jinwoo Shin

Published in: CoRL (2021)

Keyphrases

reinforcement learning
online learning
real time
learning algorithm
machine learning
training data
function approximation
neural network
optimal policy
learning process
temporal difference learning
random forests
training set
multi agent
state space
markov decision processes
learning environment
ensemble learning
temporal difference
genetic algorithm
multi agent reinforcement learning