When does return-conditioned supervised learning work for offline reinforcement learning?
David BrandfonbrenerAlberto BiettiJacob BuckmanRomain LarocheJoan BrunaPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- supervised learning
- function approximation
- unsupervised learning
- learning algorithm
- reinforcement learning algorithms
- temporal difference
- real time
- machine learning
- model free
- learning tasks
- data sets
- markov decision processes
- learning problems
- kernel based learning
- optimal policy
- training samples
- state space
- class labels
- multi agent
- dynamic programming
- active learning
- statistical learning
- real world
- control policy
- training set
- action selection
- multiple instance learning
- semi supervised learning
- labeled data
- evolutionary algorithm
- optimal control
- semi supervised
- transfer learning
- real robot
- action space
- learning agents
- training examples