​
Login / Signup
Canzhe Zhao
ORCID
Publication Activity (10 Years)
Years Active: 2021-2023
Publications (10 Years): 17
Top Topics
Stochastic Games
Differentially Private
Boltzmann Machine
Temporal Difference
Top Venues
CoRR
User Model. User Adapt. Interact.
CIKM
ICLR
</>
Publications
</>
Canzhe Zhao
,
Ruofeng Yang
,
Baoxiang Wang
,
Xuezhou Zhang
,
Shuai Li
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback.
CoRR
(2023)
Canzhe Zhao
,
Ruofeng Yang
,
Baoxiang Wang
,
Shuai Li
Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition.
ICLR
(2023)
Canzhe Zhao
,
Yanjie Ze
,
Jing Dong
,
Baoxiang Wang
,
Shuai Li
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning.
CoRR
(2023)
Fang Kong
,
Canzhe Zhao
,
Shuai Li
Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm.
CoRR
(2023)
Canzhe Zhao
,
Yanjie Ze
,
Jing Dong
,
Baoxiang Wang
,
Shuai Li
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning.
IJCAI
(2023)
Canzhe Zhao
,
Ruofeng Yang
,
Baoxiang Wang
,
Xuezhou Zhang
,
Shuai Li
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback.
NeurIPS
(2023)
Fang Kong
,
Canzhe Zhao
,
Shuai Li
Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm.
COLT
(2023)
Canzhe Zhao
,
Yanjie Ze
,
Jing Dong
,
Baoxiang Wang
,
Shuai Li
Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization.
WSDM
(2023)
Qizhi Li
,
Canzhe Zhao
,
Tong Yu
,
Junda Wu
,
Shuai Li
Clustering of conversational bandits with posterior sampling for user preference learning and elicitation.
User Model. User Adapt. Interact.
33 (5) (2023)
Zhihui Xie
,
Tong Yu
,
Canzhe Zhao
,
Shuai Li
Comparison-based Conversational Recommender System with Relative Bandit Feedback.
CoRR
(2022)
Cheng Chen
,
Canzhe Zhao
,
Shuai Li
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model.
AAAI
(2022)
Canzhe Zhao
,
Tong Yu
,
Zhihui Xie
,
Shuai Li
Knowledge-aware Conversational Preference Elicitation with Bandit Feedback.
WWW
(2022)
Cheng Chen
,
Canzhe Zhao
,
Shuai Li
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model.
CoRR
(2022)
Canzhe Zhao
,
Yanjie Ze
,
Jing Dong
,
Baoxiang Wang
,
Shuai Li
Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization.
CoRR
(2022)
Junda Wu
,
Canzhe Zhao
,
Tong Yu
,
Jingyang Li
,
Shuai Li
Clustering of Conversational Bandits for User Preference Learning and Elicitation.
CIKM
(2021)
Zhihui Xie
,
Tong Yu
,
Canzhe Zhao
,
Shuai Li
Comparison-based Conversational Recommender System with Relative Bandit Feedback.
SIGIR
(2021)
Kun Wang
,
Canzhe Zhao
,
Shuai Li
,
Shuo Shao
Conservative Contextual Combinatorial Cascading Bandit.
CoRR
(2021)