Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning.

Katherine Metcalf Miguel Sarabia Barry-John Theobald

Published in: CoRR (2022)

Keyphrases

reinforcement learning
markov decision processes
state space
reinforcement learning algorithms
function approximation
multi agent environments
machine learning
real time
dynamic environments
dynamical systems
multi agent
agent environment
optimal policy
control system
dynamic model
video sequences
optimal control
reward function
initially unknown
state and action spaces
reward shaping