Sign in

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.

Jiaming JiMickel LiuJuntao DaiXuehai PanChi ZhangCe BianBoyuan ZhangRuiyang SunYizhou WangYaodong Yang
Published in: CoRR (2023)
Keyphrases
  • improved algorithm
  • database
  • benchmark datasets
  • personality traits
  • neural network
  • data mining
  • human subjects
  • multi criteria
  • human interaction