Login / Signup
Cindy Wu
Publication Activity (10 Years)
Years Active: 2023-2024
Publications (10 Years): 3
Top Topics
Active Learning
Target Audience
Behavior Recognition
Training Examples
Top Venues
CoRR
UniReps
</>
Publications
</>
Abhay Sheshadri
,
Aidan Ewart
,
Phillip Guo
,
Aengus Lynch
,
Cindy Wu
,
Vivek Hebbar
,
Henry Sleight
,
Asa Cooper Stickland
,
Ethan Perez
,
Dylan Hadfield-Menell
,
Stephen Casper
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR
(2024)
Lucius Bushnaq
,
Jake Mendel
,
Stefan Heimersheim
,
Dan Braun
,
Nicholas Goldowsky-Dill
,
Kaarel Hänni
,
Cindy Wu
,
Marius Hobbhahn
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability.
CoRR
(2024)
Cindy Wu
,
Ekdeep Singh Lubana
,
Bruno Kacper Mlodozeniec
,
Robert Kirk
,
David Krueger
What Mechanisms Does Knowledge Distillation Distill?
UniReps
(2023)