Login / Signup
Toward Optimal LLM Alignments Using Two-Player Games.
Rui Zheng
Hongyi Guo
Zhihan Liu
Xiaoying Zhang
Yuanshun Yao
Xiaojun Xu
Zhaoran Wang
Zhiheng Xi
Tao Gui
Qi Zhang
Xuanjing Huang
Hang Li
Yang Liu
Published in:
CoRR (2024)
Keyphrases
</>
two player games
evaluation function
dynamic programming
optimal solution