​
Login / Signup
Zidi Xiong
Publication Activity (10 Years)
Years Active: 2022-2024
Publications (10 Years): 11
Top Topics
False Alarm Probability
Agents And Data Mining
Bdi Architecture
Language Modelling
Top Venues
CoRR
NeurIPS
ICLR
ICML
</>
Publications
</>
Zhuowen Yuan
,
Zidi Xiong
,
Yi Zeng
,
Ning Yu
,
Ruoxi Jia
,
Dawn Song
,
Bo Li
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content.
CoRR
(2024)
Zhen Xiang
,
Linzhi Zheng
,
Yanjie Li
,
Junyuan Hong
,
Qinbin Li
,
Han Xie
,
Jiawei Zhang
,
Zidi Xiong
,
Chulin Xie
,
Carl Yang
,
Dawn Song
,
Bo Li
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning.
CoRR
(2024)
Zhen Xiang
,
Fengqing Jiang
,
Zidi Xiong
,
Bhaskar Ramasubramanian
,
Radha Poovendran
,
Bo Li
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models.
CoRR
(2024)
Zhen Xiang
,
Fengqing Jiang
,
Zidi Xiong
,
Bhaskar Ramasubramanian
,
Radha Poovendran
,
Bo Li
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models.
ICLR
(2024)
Zhen Xiang
,
Zidi Xiong
,
Bo Li
CBD: A Certified Backdoor Detector Based on Local Dominant Probability.
CoRR
(2023)
Zhen Xiang
,
Zidi Xiong
,
Bo Li
UMD: Unsupervised Model Detection for X2X Backdoor Attacks.
ICML
(2023)
Boxin Wang
,
Weixin Chen
,
Hengzhi Pei
,
Chulin Xie
,
Mintong Kang
,
Chenhui Zhang
,
Chejian Xu
,
Zidi Xiong
,
Ritik Dutta
,
Rylan Schaeffer
,
Sang T. Truong
,
Simran Arora
,
Mantas Mazeika
,
Dan Hendrycks
,
Zinan Lin
,
Yu Cheng
,
Sanmi Koyejo
,
Dawn Song
,
Bo Li
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
CoRR
(2023)
Boxin Wang
,
Weixin Chen
,
Hengzhi Pei
,
Chulin Xie
,
Mintong Kang
,
Chenhui Zhang
,
Chejian Xu
,
Zidi Xiong
,
Ritik Dutta
,
Rylan Schaeffer
,
Sang T. Truong
,
Simran Arora
,
Mantas Mazeika
,
Dan Hendrycks
,
Zinan Lin
,
Yu Cheng
,
Sanmi Koyejo
,
Dawn Song
,
Bo Li
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
NeurIPS
(2023)
Zhen Xiang
,
Zidi Xiong
,
Bo Li
CBD: A Certified Backdoor Detector Based on Local Dominant Probability.
NeurIPS
(2023)
Zhen Xiang
,
Zidi Xiong
,
Bo Li
UMD: Unsupervised Model Detection for X2X Backdoor Attacks.
CoRR
(2023)
Minlong Peng
,
Zidi Xiong
,
Mingming Sun
,
Ping Li
Label-Smoothed Backdoor Attack.
CoRR
(2022)