Login / Signup
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing.
Biqing Qi
Pengfei Li
Fangyuan Li
Junqi Gao
Kaiyan Zhang
Bowen Zhou
Published in:
CoRR (2024)
Keyphrases
</>
online learning
real time
decision making
global optimization
optimization process
real world
artificial intelligence
knowledge base
data structure
decision makers
optimization algorithm
multi criteria
multiple criteria
soft constraints