Deep Reinforcement Learning from Hierarchical Weak Preference Feedback.
Alexander BukharinYixiao LiPengcheng HeWeizhu ChenTuo ZhaoPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- state space
- learning algorithm
- hierarchical model
- markov decision processes
- function approximation
- temporal difference
- relevance feedback
- optimal policy
- hierarchical structure
- reinforcement learning algorithms
- model free
- optimal control
- user preferences
- user feedback
- hierarchical clustering
- multiple criteria
- dynamic programming
- learning process
- multi agent
- individual preferences