Login / Signup
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences.
Andi Nika
Debmalya Mandal
Parameswaran Kamalaruban
Georgios Tzannetos
Goran Radanovic
Adish Singla
Published in:
CoRR (2024)
Keyphrases
</>
learning process
prior knowledge
learning algorithm
learning phase
learning tasks
reinforcement learning
active learning
learning mechanism
optimization model
online learning
action selection
learning models
human experts
learning systems
semi supervised
probabilistic model
decision making