Aligning Human Preferences with Baseline Objectives in Reinforcement Learning.
Daniel MartaSimon HolkChristian PekJana TumovaIolanda LeitePublished in: ICRA (2023)
Keyphrases
- reinforcement learning
- multiple objectives
- decision making
- multi agent
- image registration
- user preferences
- relative improvement
- human subjects
- function approximation
- neural network
- temporal difference
- transition model
- human users
- multi attribute
- computational models
- learning problems
- markov decision processes
- state space