​
Login / Signup
Kavel Rao
Publication Activity (10 Years)
Years Active: 2023-2024
Publications (10 Years): 6
Top Topics
Language Model
Human Level
Open Systems
Okapi Bm
Top Venues
CoRR
EMNLP (Findings)
AAAI
</>
Publications
</>
Liwei Jiang
,
Kavel Rao
,
Seungju Han
,
Allyson Ettinger
,
Faeze Brahman
,
Sachin Kumar
,
Niloofar Mireshghallah
,
Ximing Lu
,
Maarten Sap
,
Yejin Choi
,
Nouha Dziri
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models.
CoRR
(2024)
Taylor Sorensen
,
Liwei Jiang
,
Jena D. Hwang
,
Sydney Levine
,
Valentina Pyatkin
,
Peter West
,
Nouha Dziri
,
Ximing Lu
,
Kavel Rao
,
Chandra Bhagavatula
,
Maarten Sap
,
John Tasioulas
,
Yejin Choi
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties.
AAAI
(2024)
Seungju Han
,
Kavel Rao
,
Allyson Ettinger
,
Liwei Jiang
,
Bill Yuchen Lin
,
Nathan Lambert
,
Yejin Choi
,
Nouha Dziri
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs.
CoRR
(2024)
Taylor Sorensen
,
Liwei Jiang
,
Jena D. Hwang
,
Sydney Levine
,
Valentina Pyatkin
,
Peter West
,
Nouha Dziri
,
Ximing Lu
,
Kavel Rao
,
Chandra Bhagavatula
,
Maarten Sap
,
John Tasioulas
,
Yejin Choi
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties.
CoRR
(2023)
Kavel Rao
,
Liwei Jiang
,
Valentina Pyatkin
,
Yuling Gu
,
Niket Tandon
,
Nouha Dziri
,
Faeze Brahman
,
Yejin Choi
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations.
CoRR
(2023)
Kavel Rao
,
Liwei Jiang
,
Valentina Pyatkin
,
Yuling Gu
,
Niket Tandon
,
Nouha Dziri
,
Faeze Brahman
,
Yejin Choi
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations.
EMNLP (Findings)
(2023)