Reinforcement Learning from Diverse Human Preferences.
Wanqi XueBo AnShuicheng YanZhongwen XuPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- multi agent
- wide variety
- dynamic programming
- neural network
- model free
- human behavior
- optimal policy
- state space
- learning process
- real world
- machine learning
- transfer learning
- markov decision processes
- computational models
- human subjects
- multi attribute
- human interaction
- data sets
- reinforcement learning algorithms