Login / Signup
When is RL better than DPO in RLHF? A Representation and Optimization Perspective.
Ziniu Li
Tian Xu
Yang Yu
Published in:
Tiny Papers @ ICLR (2024)
Keyphrases
</>
reinforcement learning
optimization algorithm
optimization problems
combinatorial optimization
optimization method
neural network
support vector
viewpoint
particle swarm optimization
image representation
continuous domains