Can You Rely on Synthetic Labellers in Preference-Based Reinforcement Learning? It's Complicated.
Katherine MetcalfMiguel SarabiaMasha FedzechkinaBarry-John TheobaldPublished in: AAAI (2024)
Keyphrases
- reinforcement learning
- state space
- function approximation
- reinforcement learning algorithms
- real world
- real images are presented
- robotic control
- real time
- user preferences
- multi agent
- optimal control
- learning algorithm
- transfer learning
- dynamic programming
- temporal difference
- control problems
- markov decision process
- temporal difference learning
- autonomous learning
- multi agent reinforcement learning
- policy search
- relational reinforcement learning
- machine learning