Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback.
Yifu YuanJianye HaoYi MaZibin DongHebin LiangJinyi LiuZhixin FengKai ZhaoYan ZhengPublished in: CoRR (2024)
Keyphrases
- benchmark suite
- reinforcement learning
- function approximation
- real time
- motor skills
- human operators
- state space
- machine learning
- temporal difference
- human interaction
- wide variety
- relevance feedback
- multi agent
- intelligent tutoring systems
- human experts
- optimal policy
- reinforcement learning algorithms
- social networks
- tutorial dialogue
- learning algorithm