• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models.

Saeed KhakiJinJin LiLan MaLiu YangPrathap Ramachandra
Published in: CoRR (2024)
Keyphrases