Login / Signup
Mind the Gap: Offline Policy Optimization for Imperfect Rewards.
Jianxiong Li
Xiao Hu
Haoran Xu
Jingjing Liu
Xianyuan Zhan
Qing-Shan Jia
Ya-Qin Zhang
Published in:
ICLR (2023)
Keyphrases
</>
optimization algorithm
artificial intelligence
optimal policy
reinforcement learning
markov decision processes
optimization problems
global optimization
discrete optimization
least squares
reward function
dynamic programming
combinatorial optimization
optimization model
discounted reward