Login / Signup
Thomas Kwa
Publication Activity (10 Years)
Years Active: 2020-2024
Publications (10 Years): 4
Top Topics
Generalized Gaussian
Home Environments
User Activities
Ubiquitous Environments
Top Venues
CoRR
</>
Publications
</>
Thomas Kwa
,
Drake Thomas
,
Adrià Garriga-Alonso
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification.
CoRR
(2024)
Jason Gross
,
Rajashree Agrawal
,
Thomas Kwa
,
Euan Ong
,
Chun Hei Yip
,
Alex Gibson
,
Soufiane Noubir
,
Lawrence Chan
Compact Proofs of Model Performance via Mechanistic Interpretability.
CoRR
(2024)
Rohan Gupta
,
Iván Arcuschin
,
Thomas Kwa
,
Adrià Garriga-Alonso
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques.
CoRR
(2024)
Rahmadi Trimananda
,
Ali Younis
,
Thomas Kwa
,
Brian Demsky
,
Harry Xu
Securing Smart Home Edge Devices against Compromised Cloud Servers.
CoRR
(2020)