Login / Signup
Direct Preference-based Policy Optimization without Reward Modeling.
Gaon An
Junhyeok Lee
Xingdong Zuo
Norio Kosaka
Kyung-Min Kim
Hyun Oh Song
Published in:
NeurIPS (2023)
Keyphrases
</>
optimization process
reinforcement learning
policy gradient
neural network
optimization problems
global optimization
optimization methods
reward function
allocation policy
partially observable environments