Login / Signup
The Trickle-down Impact of Reward (In-)consistency on RLHF.
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
Published in:
CoRR (2023)
Keyphrases
</>
reinforcement learning
consistency checking
data sets
high impact
main factors
information systems
three dimensional
database systems
multiscale
relational databases
markov decision processes
global constraints
global consistency