Login / Signup

COPR: Continual Human Preference Learning via Optimal Policy Regularization.

Han ZhangLin GuiYu LeiYuanzhao ZhaiYehong ZhangYulan HeHui WangYue YuKam-Fai WongBin LiangRuifeng Xu
Published in: CoRR (2024)
Keyphrases