Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation.
Xiaoyu ChenHan ZhongZhuoran YangZhaoran WangLiwei WangPublished in: CoRR (2022)
Keyphrases
- function approximation
- reinforcement learning
- temporal difference
- tile coding
- model free
- temporal difference learning
- mountain car
- temporal difference learning algorithms
- radial basis function
- learning tasks
- state action space
- function approximators
- reinforcement learning algorithms
- td learning
- learning algorithm
- multi agent
- optimal policy
- temporal difference methods
- feature space
- policy evaluation
- actor critic
- monte carlo
- supervised learning
- state space