​
Login / Signup
Rui Yang
ORCID
Publication Activity (10 Years)
Years Active: 2022-2024
Publications (10 Years): 8
Top Topics
Reward Shaping
Reinforcement Learning
Preference Model
Partially Observable
Top Venues
CoRR
ICLR
Trans. Mach. Learn. Res.
NeurIPS
</>
Publications
</>
Shuang Qiu
,
Dake Zhang
,
Rui Yang
,
Boxiang Lyu
,
Tong Zhang
Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning.
CoRR
(2024)
Jiawei Xu
,
Rui Yang
,
Feng Luo
,
Meng Fang
,
Baoxiang Wang
,
Lei Han
Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling.
CoRR
(2024)
Rui Yang
,
Han Zhong
,
Jiawei Xu
,
Amy Zhang
,
Chongjie Zhang
,
Lei Han
,
Tong Zhang
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption.
ICLR
(2024)
Mianchu Wang
,
Rui Yang
,
Xi Chen
,
Hao Sun
,
Meng Fang
,
Giovanni Montana
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models.
Trans. Mach. Learn. Res.
2024 (2024)
Rui Yang
,
Ruomeng Ding
,
Yong Lin
,
Huan Zhang
,
Tong Zhang
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs.
CoRR
(2024)
Haoxiang Wang
,
Yong Lin
,
Wei Xiong
,
Rui Yang
,
Shizhe Diao
,
Shuang Qiu
,
Han Zhao
,
Tong Zhang
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards.
CoRR
(2024)
Xiaoyu Wen
,
Xudong Yu
,
Rui Yang
,
Chenjia Bai
,
Zhen Wang
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness.
CoRR
(2023)
Hao Sun
,
Lei Han
,
Rui Yang
,
Xiaoteng Ma
,
Jian Guo
,
Bolei Zhou
Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping.
NeurIPS
(2022)