Login / Signup

Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons.

Banghua ZhuJiantao JiaoMichael I. Jordan
Published in: CoRR (2023)
Keyphrases