Login / Signup
Exact Policy Recovery in Offline RL with Both Heavy-Tailed Rewards and Data Corruption.
Yiding Chen
Xuezhou Zhang
Qiaomin Xie
Xiaojin Zhu
Published in:
AAAI (2024)
Keyphrases
</>
heavy tailed
data corruption
reinforcement learning
optimal policy
markov decision processes
control policy
reward function
generalized gaussian
state space
learning algorithm
virtual memory
prior distribution
error detection
dynamic programming
nearest neighbor