Login / Signup
Aidan Ewart
Publication Activity (10 Years)
Years Active: 2023-2024
Publications (10 Years): 4
Top Topics
Vector Space Model
Behavior Recognition
N Gram
Language Modelling
Top Venues
CoRR
ICLR
</>
Publications
</>
Abhay Sheshadri
,
Aidan Ewart
,
Phillip Guo
,
Aengus Lynch
,
Cindy Wu
,
Vivek Hebbar
,
Henry Sleight
,
Asa Cooper Stickland
,
Ethan Perez
,
Dylan Hadfield-Menell
,
Stephen Casper
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR
(2024)
Aengus Lynch
,
Phillip Guo
,
Aidan Ewart
,
Stephen Casper
,
Dylan Hadfield-Menell
Eight Methods to Evaluate Robust Unlearning in LLMs.
CoRR
(2024)
Robert Huben
,
Hoagy Cunningham
,
Logan Riggs
,
Aidan Ewart
,
Lee Sharkey
Sparse Autoencoders Find Highly Interpretable Features in Language Models.
ICLR
(2024)
Hoagy Cunningham
,
Aidan Ewart
,
Logan Riggs
,
Robert Huben
,
Lee Sharkey
Sparse Autoencoders Find Highly Interpretable Features in Language Models.
CoRR
(2023)