Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning.

Published in: CoRR (2024)

Keyphrases