Flow to Better: Offline Preference-based Reinforcement Learning via Preferred Trajectory Generation.
Zhilong ZhangYihao SunJunyin YeTian-Shuo LiuJiaji ZhangYang YuPublished in: ICLR (2024)
Keyphrases
- reinforcement learning
- function approximation
- markov decision processes
- machine learning
- learning algorithm
- real time
- data sets
- multi agent
- state space
- generation process
- flow field
- moving object trajectories
- policy search
- flow patterns
- temporal difference
- optimal control
- optimal policy
- information flow
- model free
- action selection
- path planning
- learning process
- computer vision
- robotic control