Login / Signup
Phillip Guo
Publication Activity (10 Years)
Years Active: 2022-2024
Publications (10 Years): 5
Top Topics
Core Concepts
Path Relinking
Multi Start
Continuous Optimization
Top Venues
CoRR
WSC
</>
Publications
</>
Abhay Sheshadri
,
Aidan Ewart
,
Phillip Guo
,
Aengus Lynch
,
Cindy Wu
,
Vivek Hebbar
,
Henry Sleight
,
Asa Cooper Stickland
,
Ethan Perez
,
Dylan Hadfield-Menell
,
Stephen Casper
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR
(2024)
Aengus Lynch
,
Phillip Guo
,
Aidan Ewart
,
Stephen Casper
,
Dylan Hadfield-Menell
Eight Methods to Evaluate Robust Unlearning in LLMs.
CoRR
(2024)
Andy Zou
,
Long Phan
,
Sarah Chen
,
James Campbell
,
Phillip Guo
,
Richard Ren
,
Alexander Pan
,
Xuwang Yin
,
Mantas Mazeika
,
Ann-Kathrin Dombrowski
,
Shashwat Goel
,
Nathaniel Li
,
Michael J. Byun
,
Zifan Wang
,
Alex Mallen
,
Steven Basart
,
Sanmi Koyejo
,
Dawn Song
,
Matt Fredrikson
,
J. Zico Kolter
,
Dan Hendrycks
Representation Engineering: A Top-Down Approach to AI Transparency.
CoRR
(2023)
James Campbell
,
Richard Ren
,
Phillip Guo
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching.
CoRR
(2023)
Phillip Guo
,
Michael C. Fu
Bandit-Based Multi-Start Strategies for Global Continuous Optimization.
WSC
(2022)