COPR: Continual Human Preference Learning via Optimal Policy Regularization.

Published in: CoRR (2024)

Keyphrases