Login / Signup
Alex D'Amour
Publication Activity (10 Years)
Years Active: 2018-2024
Publications (10 Years): 4
Top Topics
Space Requirements
Low Dimensional
Linear Model
Neural Network Ensembles
Top Venues
CoRR
</>
Publications
</>
Zihao Wang
,
Chirag Nagpal
,
Jonathan Berant
,
Jacob Eisenstein
,
Alex D'Amour
,
Sanmi Koyejo
,
Victor Veitch
Transforming and Combining Rewards for Aligning Large Language Models.
CoRR
(2024)
Jacob Eisenstein
,
Chirag Nagpal
,
Alekh Agarwal
,
Ahmad Beirami
,
Alex D'Amour
,
Dj Dvijotham
,
Adam Fisch
,
Katherine A. Heller
,
Stephen Pfohl
,
Deepak Ramachandran
,
Peter Shaw
,
Jonathan Berant
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking.
CoRR
(2023)
David Madras
,
James Atwood
,
Alex D'Amour
Detecting Extrapolation with Local Ensembles.
CoRR
(2019)
Alexey A. Gritsenko
,
Alex D'Amour
,
James Atwood
,
Yoni Halpern
,
D. Sculley
BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity.
CoRR
(2018)