Login / Signup
Nan Jiang
Publication Activity (10 Years)
Years Active: 2014-2024
Publications (10 Years): 45
Top Topics
Markov Decision Process
Reinforcement Learning
Predictive State Representations
Partially Observable
Top Venues
CoRR
NeurIPS
ICML
ICLR
</>
Publications
</>
Philip Amortila
,
Dylan J. Foster
,
Nan Jiang
,
Ayush Sekhari
,
Tengyang Xie
Harnessing Density Ratios for Online Reinforcement Learning.
CoRR
(2024)
Yuheng Zhang
,
Dian Yu
,
Baolin Peng
,
Linfeng Song
,
Ye Tian
,
Mingyue Huo
,
Nan Jiang
,
Haitao Mi
,
Dong Yu
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning.
CoRR
(2024)
Philip Amortila
,
Dylan J. Foster
,
Nan Jiang
,
Ayush Sekhari
,
Tengyang Xie
Harnessing Density Ratios for Online Reinforcement Learning.
ICLR
(2024)
Hanze Dong
,
Wei Xiong
,
Bo Pang
,
Haoxiang Wang
,
Han Zhao
,
Yingbo Zhou
,
Nan Jiang
,
Doyen Sahoo
,
Caiming Xiong
,
Tong Zhang
RLHF Workflow: From Reward Modeling to Online RLHF.
CoRR
(2024)
Audrey Huang
,
Jinglin Chen
,
Nan Jiang
Reinforcement Learning in Low-rank MDPs with Density Features.
ICML
(2023)
Mohak Bhardwaj
,
Tengyang Xie
,
Byron Boots
,
Nan Jiang
,
Ching-An Cheng
Adversarial Model for Offline Reinforcement Learning.
CoRR
(2023)
Philip Amortila
,
Nan Jiang
,
Csaba Szepesvári
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation.
CoRR
(2023)
Masatoshi Uehara
,
Haruka Kiyohara
,
Andrew Bennett
,
Victor Chernozhukov
,
Nan Jiang
,
Nathan Kallus
,
Chengchun Shi
,
Wen Sun
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs.
NeurIPS
(2023)
Philip Amortila
,
Nan Jiang
,
Csaba Szepesvári
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation.
ICML
(2023)
Mohak Bhardwaj
,
Tengyang Xie
,
Byron Boots
,
Nan Jiang
,
Ching-An Cheng
Adversarial Model for Offline Reinforcement Learning.
NeurIPS
(2023)
Tengyang Xie
,
Dylan J. Foster
,
Yu Bai
,
Nan Jiang
,
Sham M. Kakade
The Role of Coverage in Online Reinforcement Learning.
ICLR
(2023)
Jinglin Chen
,
Nan Jiang
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps.
CoRR
(2022)
Jinglin Chen
,
Aditya Modi
,
Akshay Krishnamurthy
,
Nan Jiang
,
Alekh Agarwal
On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL.
NeurIPS
(2022)
Jinglin Chen
,
Nan Jiang
Offline reinforcement learning under value and density-ratio realizability: The power of gaps.
UAI
(2022)
Tengyang Xie
,
Dylan J. Foster
,
Yu Bai
,
Nan Jiang
,
Sham M. Kakade
The Role of Coverage in Online Reinforcement Learning.
CoRR
(2022)
Philip Amortila
,
Nan Jiang
,
Dhruv Madeka
,
Dean P. Foster
A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation.
NeurIPS
(2022)
Tengyang Xie
,
Mohak Bhardwaj
,
Nan Jiang
,
Ching-An Cheng
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data.
CoRR
(2022)
Philip Amortila
,
Nan Jiang
,
Dhruv Madeka
,
Dean P. Foster
A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation.
CoRR
(2022)
Ching-An Cheng
,
Tengyang Xie
,
Nan Jiang
,
Alekh Agarwal
Adversarially Trained Actor Critic for Offline Reinforcement Learning.
ICML
(2022)
Tengyang Xie
,
Akanksha Saran
,
Dylan J. Foster
,
Lekan P. Molu
,
Ida Momennejad
,
Nan Jiang
,
Paul Mineiro
,
John Langford
Interaction-Grounded Learning with Action-Inclusive Feedback.
NeurIPS
(2022)
Audrey Huang
,
Nan Jiang
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions.
NeurIPS
(2022)
Jinglin Chen
,
Aditya Modi
,
Akshay Krishnamurthy
,
Nan Jiang
,
Alekh Agarwal
On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL.
CoRR
(2022)
Masatoshi Uehara
,
Haruka Kiyohara
,
Andrew Bennett
,
Victor Chernozhukov
,
Nan Jiang
,
Nathan Kallus
,
Chengchun Shi
,
Wen Sun
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs.
CoRR
(2022)
Jiawei Huang
,
Li Zhao
,
Tao Qin
,
Wei Chen
,
Nan Jiang
,
Tie-Yan Liu
Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret.
NeurIPS
(2022)
Tengyang Xie
,
Akanksha Saran
,
Dylan J. Foster
,
Lekan Molu
,
Ida Momennejad
,
Nan Jiang
,
Paul Mineiro
,
John Langford
Interaction-Grounded Learning with Action-inclusive Feedback.
CoRR
(2022)
Jiawei Huang
,
Jinglin Chen
,
Li Zhao
,
Tao Qin
,
Nan Jiang
,
Tie-Yan Liu
Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality.
ICLR
(2022)
Chengchun Shi
,
Masatoshi Uehara
,
Jiawei Huang
,
Nan Jiang
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes.
ICML
(2022)
Aditya Modi
,
Jinglin Chen
,
Akshay Krishnamurthy
,
Nan Jiang
,
Alekh Agarwal
Model-free Representation Learning and Exploration in Low-rank MDPs.
CoRR
(2021)
Tengyang Xie
,
Ching-An Cheng
,
Nan Jiang
,
Paul Mineiro
,
Alekh Agarwal
Bellman-consistent Pessimism for Offline Reinforcement Learning.
NeurIPS
(2021)
Tengyang Xie
,
Nan Jiang
,
Huan Wang
,
Caiming Xiong
,
Yu Bai
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning.
NeurIPS
(2021)
Cameron Voloshin
,
Hoang Minh Le
,
Nan Jiang
,
Yisong Yue
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning.
NeurIPS Datasets and Benchmarks
(2021)
Aditya Modi
,
Nan Jiang
,
Ambuj Tewari
,
Satinder P. Singh
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles.
AISTATS
(2020)
Aditya Modi
,
Nan Jiang
,
Ambuj Tewari
,
Satinder P. Singh
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles.
CoRR
(2019)
Nan Jiang
,
Alex Kulesza
,
Satinder P. Singh
Completing State Representations using Spectral Learning.
NeurIPS
(2018)
Aditya Modi
,
Nan Jiang
,
Satinder P. Singh
,
Ambuj Tewari
Markov Decision Processes with Continuous Side Information.
ALT
(2018)
Kareem Amin
,
Nan Jiang
,
Satinder P. Singh
Repeated Inverse Reinforcement Learning.
NIPS
(2017)
Aditya Modi
,
Nan Jiang
,
Satinder P. Singh
,
Ambuj Tewari
Markov Decision Processes with Continuous Side Information.
CoRR
(2017)
Kareem Amin
,
Nan Jiang
,
Satinder P. Singh
Repeated Inverse Reinforcement Learning.
CoRR
(2017)
Nan Jiang
,
Satinder P. Singh
,
Ambuj Tewari
On Structural Properties of MDPs that Bound Loss Due to Shallow Planning.
IJCAI
(2016)
Nan Jiang
,
Alex Kulesza
,
Satinder P. Singh
,
Richard L. Lewis
The Dependence of Effective Planning Horizon on Model Accuracy.
IJCAI
(2016)
Nan Jiang
,
Alex Kulesza
,
Satinder P. Singh
Improving Predictive State Representations via Gradient Descent.
AAAI
(2016)
Alex Kulesza
,
Nan Jiang
,
Satinder P. Singh
Low-Rank Spectral Learning with Weighted Loss Functions.
AISTATS
(2015)
Alex Kulesza
,
Nan Jiang
,
Satinder P. Singh
Spectral Learning of Predictive State Representations with Insufficient Statistics.
AAAI
(2015)
Nan Jiang
,
Alex Kulesza
,
Satinder P. Singh
Abstraction Selection in Model-based Reinforcement Learning.
ICML
(2015)
Nan Jiang
,
Alex Kulesza
,
Satinder P. Singh
,
Richard L. Lewis
The Dependence of Effective Planning Horizon on Model Accuracy.
AAMAS
(2015)
Nan Jiang
,
Satinder P. Singh
,
Richard L. Lewis
Improving UCT planning via approximate homomorphisms.
AAMAS
(2014)