Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization.
Sreejith BalakrishnanQuoc Phong NguyenBryan Kian Hsiang LowHarold SohPublished in: NeurIPS (2020)
Keyphrases
- inverse reinforcement learning
- reward function
- bayesian nonparametric
- preference elicitation
- markov decision processes
- reinforcement learning
- state space
- reinforcement learning algorithms
- multiple agents
- optimal policy
- temporal difference
- maximum likelihood
- partially observable
- simple examples
- state variables
- markov decision process
- bayesian inference
- transition probabilities
- dynamic programming
- bayesian networks