Versatile Inverse Reinforcement Learning via Cumulative Rewards.

Niklas Freymuth Philipp Becker Gerhard Neumann

Published in: CoRR (2021)

Keyphrases

inverse reinforcement learning
reward function
bayesian nonparametric
markov decision processes
reinforcement learning
partially observable environments
state space
reinforcement learning algorithms
preference elicitation
partially observable
multiple agents
optimal policy
simple examples
markov decision process
generative model
state variables
machine learning
temporal difference
function approximation
average reward
decision makers
control policies
dynamic programming
multi agent
artificial intelligence