​
Login / Signup
Qinbo Bai
ORCID
Publication Activity (10 Years)
Years Active: 2019-2024
Publications (10 Years): 23
Top Topics
Reinforcement Learning
Joint Optimization
Markov Decision Process
Average Reward
Top Venues
CoRR
AAAI
AISTATS
IEEE Trans. Veh. Technol.
</>
Publications
</>
Vaneet Aggarwal
,
Washim Uddin Mondal
,
Qinbo Bai
Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms.
Found. Trends Optim.
6 (4) (2024)
Qinbo Bai
,
Washim Uddin Mondal
,
Vaneet Aggarwal
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes.
AAAI
(2024)
Qinbo Bai
,
Washim Uddin Mondal
,
Vaneet Aggarwal
Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm.
CoRR
(2024)
Vaneet Aggarwal
,
Washim Uddin Mondal
,
Qinbo Bai
Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms.
CoRR
(2024)
Qinbo Bai
,
Washim Uddin Mondal
,
Vaneet Aggarwal
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes.
CoRR
(2023)
Qinbo Bai
,
Amrit Singh Bedi
,
Vaneet Aggarwal
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm.
AAAI
(2023)
Nan Geng
,
Qinbo Bai
,
Chenyi Liu
,
Tian Lan
,
Vaneet Aggarwal
,
Yuan Yang
,
Mingwei Xu
A Reinforcement Learning Framework for Vehicular Network Routing Under Peak and Average Constraints.
IEEE Trans. Veh. Technol.
72 (5) (2023)
Qinbo Bai
,
Vaneet Aggarwal
,
Ather Gattami
Provably Sample-Efficient Model-Free Algorithm for MDPs with Peak Constraints.
J. Mach. Learn. Res.
24 (2023)
Qinbo Bai
,
Amrit Singh Bedi
,
Vaneet Aggarwal
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm.
CoRR
(2022)
Qinbo Bai
,
Mridul Agarwal
,
Vaneet Aggarwal
Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm.
J. Artif. Intell. Res.
74 (2022)
Mridul Agarwal
,
Qinbo Bai
,
Vaneet Aggarwal
Regret guarantees for model-based reinforcement learning with long-term average constraints.
UAI
(2022)
Mridul Agarwal
,
Qinbo Bai
,
Vaneet Aggarwal
Concave Utility Reinforcement Learning with Zero-Constraint Violations.
Trans. Mach. Learn. Res.
2022 (2022)
Qinbo Bai
,
Amrit Singh Bedi
,
Mridul Agarwal
,
Alec Koppel
,
Vaneet Aggarwal
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach.
AAAI
(2022)
Qinbo Bai
,
Amrit Singh Bedi
,
Mridul Agarwal
,
Alec Koppel
,
Vaneet Aggarwal
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach.
CoRR
(2021)
Ather Gattami
,
Qinbo Bai
,
Vaneet Aggarwal
Reinforcement Learning for Constrained Markov Decision Processes.
AISTATS
(2021)
Qinbo Bai
,
Mridul Agarwal
,
Vaneet Aggarwal
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm.
CoRR
(2021)
Mridul Agarwal
,
Qinbo Bai
,
Vaneet Aggarwal
Markov Decision Processes with Long-Term Average Constraints.
CoRR
(2021)
Mridul Agarwal
,
Qinbo Bai
,
Vaneet Aggarwal
Concave Utility Reinforcement Learning with Zero-Constraint Violations.
CoRR
(2021)
Qinbo Bai
,
Vaneet Aggarwal
,
Ather Gattami
Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints.
CoRR
(2020)
Qinbo Bai
,
Ather Gattami
,
Vaneet Aggarwal
Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints.
CoRR
(2020)
Qinbo Bai
,
Jintao Wang
,
Yue Zhang
,
Jian Song
Deep Learning-Based Channel Estimation Algorithm Over Time Selective Fading Channels.
IEEE Trans. Cogn. Commun. Netw.
6 (1) (2020)
Qinbo Bai
,
Mridul Agarwal
,
Vaneet Aggarwal
Escaping Saddle Points for Zeroth-order Non-convex Optimization using Estimated Gradient Descent.
CISS
(2020)
Qinbo Bai
,
Mridul Agarwal
,
Vaneet Aggarwal
Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent.
CoRR
(2019)