Sign in
Paul Röttger
Publication Activity (10 Years)
Years Active: 2020-2023
Publications (10 Years): 23
Top Topics
Social Media
Language Model
N Gram
Test Suite
Top Venues
CoRR
EMNLP
NAACL-HLT
ACL (1)
</>
Publications
</>
Hannah Rose Kirk
,
Bertie Vidgen
,
Paul Röttger
,
Scott A. Hale
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback.
CoRR
(2023)
Matthias Orlikowski
,
Paul Röttger
,
Philipp Cimiano
,
Dirk Hovy
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics.
CoRR
(2023)
Hannah Rose Kirk
,
Wenjie Yin
,
Bertie Vidgen
,
Paul Röttger
SemEval-2023 Task 10: Explainable Detection of Online Sexism.
CoRR
(2023)
Matthias Orlikowski
,
Paul Röttger
,
Philipp Cimiano
,
Dirk Hovy
The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics.
ACL (2)
(2023)
Hannah Kirk
,
Andrew M. Bean
,
Bertie Vidgen
,
Paul Röttger
,
Scott A. Hale
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values.
EMNLP
(2023)
Hannah Rose Kirk
,
Andrew M. Bean
,
Bertie Vidgen
,
Paul Röttger
,
Scott A. Hale
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values.
CoRR
(2023)
Hannah Kirk
,
Wenjie Yin
,
Bertie Vidgen
,
Paul Röttger
SemEval-2023 Task 10: Explainable Detection of Online Sexism.
SemEval@ACL
(2023)
Janosch Haber
,
Bertie Vidgen
,
Matthew Chapman
,
Vibhor Agarwal
,
Roy Ka-Wei Lee
,
Yong Keong Yap
,
Paul Röttger
Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore.
ACL (1)
(2023)
Bertie Vidgen
,
Hannah Rose Kirk
,
Rebecca Qian
,
Nino Scherrer
,
Anand Kannappan
,
Scott A. Hale
,
Paul Röttger
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models.
CoRR
(2023)
Federico Bianchi
,
Mirac Suzgun
,
Giuseppe Attanasio
,
Paul Röttger
,
Dan Jurafsky
,
Tatsunori Hashimoto
,
James Zou
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions.
CoRR
(2023)
Hannah Rose Kirk
,
Bertie Vidgen
,
Paul Röttger
,
Scott A. Hale
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models.
CoRR
(2023)
Paul Röttger
,
Hannah Rose Kirk
,
Bertie Vidgen
,
Giuseppe Attanasio
,
Federico Bianchi
,
Dirk Hovy
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models.
CoRR
(2023)
Paul Röttger
,
Debora Nozza
,
Federico Bianchi
,
Dirk Hovy
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages.
EMNLP
(2022)
Hannah Kirk
,
Bertie Vidgen
,
Paul Röttger
,
Tristan Thrush
,
Scott A. Hale
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate.
NAACL-HLT
(2022)
Paul Röttger
,
Bertie Vidgen
,
Dirk Hovy
,
Janet B. Pierrehumbert
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks.
NAACL-HLT
(2022)
Paul Röttger
,
Debora Nozza
,
Federico Bianchi
,
Dirk Hovy
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages.
CoRR
(2022)
Paul Röttger
,
Haitham Seelawi
,
Debora Nozza
,
Zeerak Talat
,
Bertie Vidgen
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models.
CoRR
(2022)
Paul Röttger
,
Janet B. Pierrehumbert
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media.
EMNLP (Findings)
(2021)
Paul Röttger
,
Bertie Vidgen
,
Dong Nguyen
,
Zeerak Waseem
,
Helen Z. Margetts
,
Janet B. Pierrehumbert
HateCheck: Functional Tests for Hate Speech Detection Models.
ACL/IJCNLP (1)
(2021)
Hannah Rose Kirk
,
Bertram Vidgen
,
Paul Röttger
,
Tristan Thrush
,
Scott A. Hale
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate.
CoRR
(2021)
Paul Röttger
,
Bertie Vidgen
,
Dirk Hovy
,
Janet B. Pierrehumbert
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks.
CoRR
(2021)
Paul Röttger
,
Janet B. Pierrehumbert
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media.
CoRR
(2021)
Paul Röttger
,
Bertram Vidgen
,
Dong Nguyen
,
Zeerak Waseem
,
Helen Z. Margetts
,
Janet B. Pierrehumbert
HateCheck: Functional Tests for Hate Speech Detection Models.
CoRR
(2020)