Sign in

STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization.

Yachen KangLi HeJinxin LiuZifeng ZhuangDonglin Wang
Published in: CoRR (2023)
Keyphrases