Login / Signup
Beyond Reward: Offline Preference-guided Policy Optimization.
Yachen Kang
Diyuan Shi
Jinxin Liu
Li He
Donglin Wang
Published in:
CoRR (2023)
Keyphrases
</>
optimization algorithm
global optimization
real time
optimal policy
optimization process
inverse reinforcement learning
average reward
asymptotically optimal
optimization model
optimization method
optimization problems
multi attribute
linear program
long run
sufficient conditions
policy gradient
objective function