Versatile Inverse Reinforcement Learning via Cumulative Rewards.
Niklas FreymuthPhilipp BeckerGerhard NeumannPublished in: CoRR (2021)
Keyphrases
- inverse reinforcement learning
- reward function
- bayesian nonparametric
- markov decision processes
- reinforcement learning
- partially observable environments
- state space
- reinforcement learning algorithms
- preference elicitation
- partially observable
- multiple agents
- optimal policy
- simple examples
- markov decision process
- generative model
- state variables
- machine learning
- temporal difference
- function approximation
- average reward
- decision makers
- control policies
- dynamic programming
- multi agent
- artificial intelligence