​
Login / Signup
Runlong Zhou
Publication Activity (10 Years)
Years Active: 2021-2024
Publications (10 Years): 12
Top Topics
Combinatorial Optimization
Markov Decision Process
Reinforcement Learning
Stochastic Shortest Path
Top Venues
CoRR
ICML
ACL (1)
ICLR
</>
Publications
</>
Runlong Zhou
,
Simon S. Du
,
Beibin Li
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs.
ACL (1)
(2024)
Runlong Zhou
,
Simon S. Du
,
Beibin Li
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs.
CoRR
(2024)
Zhaoyi Zhou
,
Chuning Zhu
,
Runlong Zhou
,
Qiwen Cui
,
Abhishek Gupta
,
Simon Shaolei Du
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning.
ICLR
(2024)
Runlong Zhou
,
Ruosong Wang
,
Simon Shaolei Du
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes.
ICML
(2023)
Zhaoyi Zhou
,
Chuning Zhu
,
Runlong Zhou
,
Qiwen Cui
,
Abhishek Gupta
,
Simon Shaolei Du
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning.
CoRR
(2023)
Runlong Zhou
,
Zihan Zhang
,
Simon S. Du
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments.
CoRR
(2023)
Runlong Zhou
,
Zihan Zhang
,
Simon Shaolei Du
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments.
ICML
(2023)
Runlong Zhou
,
Zelin He
,
Yuandong Tian
,
Yi Wu
,
Simon Shaolei Du
Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization.
Trans. Mach. Learn. Res.
2023 (2023)
Runlong Zhou
,
Ruosong Wang
,
Simon S. Du
Horizon-Free Reinforcement Learning for Latent Markov Decision Processes.
CoRR
(2022)
Runlong Zhou
,
Yuandong Tian
,
Yi Wu
,
Simon S. Du
Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems.
CoRR
(2022)
Jean Tarbouriech
,
Runlong Zhou
,
Simon S. Du
,
Matteo Pirotta
,
Michal Valko
,
Alessandro Lazaric
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret.
CoRR
(2021)
Jean Tarbouriech
,
Runlong Zhou
,
Simon S. Du
,
Matteo Pirotta
,
Michal Valko
,
Alessandro Lazaric
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret.
NeurIPS
(2021)