OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning.
Peter HendersonWei-Di ChangPierre-Luc BaconDavid MegerJoelle PineauDoina PrecupPublished in: AAAI (2018)
Keyphrases
- inverse reinforcement learning
- partially observable environments
- bayesian nonparametric
- reward function
- preference elicitation
- reinforcement learning
- learning algorithm
- unsupervised learning
- learning tasks
- machine learning
- supervised learning
- generative model
- dynamic systems
- temporal difference
- partially observable