Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning.
Katherine MetcalfMiguel SarabiaBarry-John TheobaldPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- markov decision processes
- state space
- reinforcement learning algorithms
- function approximation
- multi agent environments
- machine learning
- real time
- dynamic environments
- dynamical systems
- multi agent
- agent environment
- optimal policy
- control system
- dynamic model
- video sequences
- optimal control
- reward function
- initially unknown
- state and action spaces
- reward shaping