Safe RLHF: Safe Reinforcement Learning from Human Feedback.
Josef DaiXuehai PanRuiyang SunJiaming JiXinbo XuMickel LiuYizhou WangYaodong YangPublished in: ICLR (2024)
Keyphrases
- reinforcement learning
- human interaction
- markov decision processes
- learning algorithm
- user engagement
- human operators
- relevance feedback
- human behavior
- function approximation
- human subjects
- computational models
- data mining
- active learning
- learning process
- artificial neural networks
- multi agent
- decision making
- machine learning