BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.
Jiaming JiMickel LiuJosef DaiXuehai PanChi ZhangCe BianBoyuan ChenRuiyang SunYizhou WangYaodong YangPublished in: NeurIPS (2023)
Keyphrases
- user preferences
- information retrieval
- human experts
- improved algorithm
- benchmark datasets
- human users
- human behavior
- personality traits
- human operators
- preference relations
- emotional state
- synthetic datasets
- multi criteria
- multi attribute
- database
- human subjects
- image registration
- video sequences
- information systems
- neural network