Login / Signup
Josef Dai
Publication Activity (10 Years)
Years Active: 2023-2024
Publications (10 Years): 7
Top Topics
Graph Theory
Direct Policy Search
Weakly Labeled
Reinforcement Learning
Top Venues
CoRR
NeurIPS
ICLR
</>
Publications
</>
Jiaming Ji
,
Donghai Hong
,
Borong Zhang
,
Boyuan Chen
,
Josef Dai
,
Boren Zheng
,
Tianyi Qiu
,
Boxun Li
,
Yaodong Yang
PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models.
CoRR
(2024)
Josef Dai
,
Tianle Chen
,
Xuyao Wang
,
Ziran Yang
,
Taiye Chen
,
Jiaming Ji
,
Yaodong Yang
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset.
CoRR
(2024)
Tianyi Qiu
,
Fanzhi Zeng
,
Jiaming Ji
,
Dong Yan
,
Kaile Wang
,
Jiayi Zhou
,
Han Yang
,
Josef Dai
,
Xuehai Pan
,
Yaodong Yang
Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective.
CoRR
(2024)
Josef Dai
,
Xuehai Pan
,
Ruiyang Sun
,
Jiaming Ji
,
Xinbo Xu
,
Mickel Liu
,
Yizhou Wang
,
Yaodong Yang
Safe RLHF: Safe Reinforcement Learning from Human Feedback.
ICLR
(2024)
Jiaming Ji
,
Borong Zhang
,
Jiayi Zhou
,
Xuehai Pan
,
Weidong Huang
,
Ruiyang Sun
,
Yiran Geng
,
Yifan Zhong
,
Josef Dai
,
Yaodong Yang
Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark.
NeurIPS
(2023)
Jiaming Ji
,
Mickel Liu
,
Josef Dai
,
Xuehai Pan
,
Chi Zhang
,
Ce Bian
,
Boyuan Chen
,
Ruiyang Sun
,
Yizhou Wang
,
Yaodong Yang
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.
NeurIPS
(2023)
Josef Dai
,
Xuehai Pan
,
Ruiyang Sun
,
Jiaming Ji
,
Xinbo Xu
,
Mickel Liu
,
Yizhou Wang
,
Yaodong Yang
Safe RLHF: Safe Reinforcement Learning from Human Feedback.
CoRR
(2023)