Login / Signup
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning.
Yuheng Zhang
Dian Yu
Baolin Peng
Linfeng Song
Ye Tian
Mingyue Huo
Nan Jiang
Haitao Mi
Dong Yu
Published in:
CoRR (2024)
Keyphrases
</>
learning process
online learning
learning systems
learning algorithm
active learning
special case
decision making
learning tasks
preference learning
multi agent
evolutionary algorithm
multi objective
supervised learning
worst case
optimization algorithm