Mind the Gap: Offline Policy Optimization for Imperfect Rewards.
Jianxiong LiXiao HuHaoran XuJingjing LiuXianyuan ZhanQing-Shan JiaYa-Qin ZhangPublished in: CoRR (2023)
Keyphrases
- optimization algorithm
- reinforcement learning
- reward function
- markov decision processes
- discrete optimization
- artificial intelligence
- optimization problems
- optimal policy
- constrained optimization
- global optimization
- genetic algorithm
- optimization model
- optimization method
- cognitive science
- optimization methods
- mental states
- differential evolution
- evolution strategy
- action selection
- dynamic programming
- asymptotically optimal
- neural network
- multiarmed bandit