Keyphrases
- fine tuning
- inverse reinforcement learning
- bayesian nonparametric
- partially observable environments
- preference elicitation
- reward function
- fine tuned
- semi supervised
- supervised learning
- unsupervised learning
- learning algorithm
- temporal difference
- multi objective
- utility function
- dynamic systems
- markov decision process