Beyond Reward: Offline Preference-guided Policy Optimization.
Yachen KangDiyuan ShiJinxin LiuLi HeDonglin WangPublished in: ICML (2023)
Keyphrases
- optimization problems
- optimal policy
- global optimization
- real time
- reward function
- inverse reinforcement learning
- policy gradient
- optimization algorithm
- long run
- optimization method
- average reward
- infinite horizon
- constrained optimization
- reinforcement learning
- partially observable environments
- preference elicitation
- combinatorial optimization
- user preferences
- state space
- dynamic programming
- machine learning