STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization.
Yachen KangLi HeJinxin LiuZifeng ZhuangDonglin WangPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- function approximation
- semi supervised learning
- training set
- regularization methods
- temporal difference
- semi supervised
- markov decision processes
- regularization parameter
- machine learning
- state space
- co training
- prior information
- learning algorithm
- model free
- multi agent
- cost sensitive
- learning problems
- image restoration
- regularization term
- supervised learning
- learning process
- reinforcement learning algorithms
- function approximators
- regularization method