Quantifying Differences in Reward Functions.

Adam Gleave Michael Dennis Shane Legg Stuart Russell Jan Leike

Published in: CoRR (2020)

Keyphrases

reward function
markov decision processes
inverse reinforcement learning
reinforcement learning algorithms
reinforcement learning
statistically significant
multiple agents
state space
optimal policy
policy search
markov decision process
initially unknown
state variables
clustering algorithm
objective function
transition probabilities