Login / Signup
Mikita Balesni
Publication Activity (10 Years)
Years Active: 2023-2023
Publications (10 Years): 3
Top Topics
Situational Awareness
Svm Classifier
Language Models For Information Retrieval
Dimension Reduction
Top Venues
CoRR
</>
Publications
</>
Lukas Berglund
,
Asa Cooper Stickland
,
Mikita Balesni
,
Max Kaufmann
,
Meg Tong
,
Tomasz Korbak
,
Daniel Kokotajlo
,
Owain Evans
Taken out of context: On measuring situational awareness in LLMs.
CoRR
(2023)
Jérémy Scheurer
,
Mikita Balesni
,
Marius Hobbhahn
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure.
CoRR
(2023)
Lukas Berglund
,
Meg Tong
,
Max Kaufmann
,
Mikita Balesni
,
Asa Cooper Stickland
,
Tomasz Korbak
,
Owain Evans
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A".
CoRR
(2023)