Login / Signup
Mikita Balesni
Publication Activity (10 Years)
Years Active: 2023-2024
Publications (10 Years): 5
Top Topics
Decision Support
Situational Awareness
Language Model
Multilayer Perceptron
Top Venues
CoRR
ICLR
</>
Publications
</>
Rudolf Laine
,
Bilal Chughtai
,
Jan Betley
,
Kaivalya Hariharan
,
Jérémy Scheurer
,
Mikita Balesni
,
Marius Hobbhahn
,
Alexander Meinke
,
Owain Evans
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs.
CoRR
(2024)
Lukas Berglund
,
Meg Tong
,
Maximilian Kaufmann
,
Mikita Balesni
,
Asa Cooper Stickland
,
Tomasz Korbak
,
Owain Evans
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A".
ICLR
(2024)
Lukas Berglund
,
Asa Cooper Stickland
,
Mikita Balesni
,
Max Kaufmann
,
Meg Tong
,
Tomasz Korbak
,
Daniel Kokotajlo
,
Owain Evans
Taken out of context: On measuring situational awareness in LLMs.
CoRR
(2023)
Jérémy Scheurer
,
Mikita Balesni
,
Marius Hobbhahn
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure.
CoRR
(2023)
Lukas Berglund
,
Meg Tong
,
Max Kaufmann
,
Mikita Balesni
,
Asa Cooper Stickland
,
Tomasz Korbak
,
Owain Evans
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A".
CoRR
(2023)