• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

SALMON: Self-Alignment with Principle-Following Reward Models.

Zhiqing SunYikang ShenHongxin ZhangQinhong ZhouZhenfang ChenDavid D. CoxYiming YangChuang Gan
Published in: CoRR (2023)
Keyphrases
  • model selection
  • statistical models
  • information systems
  • face recognition
  • training data
  • reinforcement learning
  • parameter estimation
  • statistical model
  • experimental data