Canzhe Zhao

Publication Activity (10 Years)

Years Active: 2021-2023
Publications (10 Years): 17

Top Topics

Stochastic Games

Differentially Private

Boltzmann Machine

Temporal Difference

Top Venues

User Model. User Adapt. Interact.

Publications

Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback. CoRR (2023)
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Shuai Li
Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition. ICLR (2023)
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning. CoRR (2023)
Fang Kong, Canzhe Zhao, Shuai Li
Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm. CoRR (2023)
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning. IJCAI (2023)
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback. NeurIPS (2023)
Fang Kong, Canzhe Zhao, Shuai Li
Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm. COLT (2023)
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization. WSDM (2023)
Qizhi Li, Canzhe Zhao, Tong Yu, Junda Wu, Shuai Li
Clustering of conversational bandits with posterior sampling for user preference learning and elicitation. User Model. User Adapt. Interact. 33 (5) (2023)
Zhihui Xie, Tong Yu, Canzhe Zhao, Shuai Li
Comparison-based Conversational Recommender System with Relative Bandit Feedback. CoRR (2022)
Cheng Chen, Canzhe Zhao, Shuai Li
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model. AAAI (2022)
Canzhe Zhao, Tong Yu, Zhihui Xie, Shuai Li
Knowledge-aware Conversational Preference Elicitation with Bandit Feedback. WWW (2022)
Cheng Chen, Canzhe Zhao, Shuai Li
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model. CoRR (2022)
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization. CoRR (2022)
Junda Wu, Canzhe Zhao, Tong Yu, Jingyang Li, Shuai Li
Clustering of Conversational Bandits for User Preference Learning and Elicitation. CIKM (2021)
Zhihui Xie, Tong Yu, Canzhe Zhao, Shuai Li
Comparison-based Conversational Recommender System with Relative Bandit Feedback. SIGIR (2021)
Kun Wang, Canzhe Zhao, Shuai Li, Shuo Shao
Conservative Contextual Combinatorial Cascading Bandit. CoRR (2021)