DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation.
Sravan JayanthiLetian ChenNadya BalabanskaVan DuongErik ScarlatescuEzra AmeperosaZulfiqar Haider ZaidiDaniel MartinTaylor Keith Del MattoMasahiro OnoMatthew C. GombolayPublished in: CoRL (2023)
Keyphrases
- learning algorithm
- inverse reinforcement learning
- reinforcement learning
- learning systems
- active learning
- online learning
- partially observable environments
- real time
- knowledge acquisition
- supervised learning
- prior knowledge
- decision trees
- neural network
- learning objects
- semi supervised
- markov chain
- unsupervised learning
- learning analytics
- action selection
- reward function
- reinforcement learning algorithms
- knowledge base
- eligibility traces