Login / Signup
Luke Marks
Publication Activity (10 Years)
Years Active: 2023-2024
Publications (10 Years): 2
Top Topics
Partial Information
Language Modelling
Ir Models
Test Bed
Top Venues
CoRR
</>
Publications
</>
Luke Marks
Informal Safety Guarantees for Simulated Optimizers Through Extrapolation from Partial Simulations.
CoRR
(2024)
Luke Marks
,
Amir Abdullah
,
Luna Mendez
,
Rauno Arike
,
Philip H. S. Torr
,
Fazl Barez
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders.
CoRR
(2023)