CROP: Conservative Reward for Model-based Offline Policy Optimization.
Hao LiXiao-Hu ZhouXiao-Liang XieShi-Qi LiuZhen-Qiu FengXiao-Yin LiuMei-Jiang GuiTian-Yu XiangDe-Xing HuangBo-Xian YaoZeng-Guang HouPublished in: CoRR (2023)
Keyphrases
- optimization process
- average reward
- optimization algorithm
- reinforcement learning
- real time
- discrete optimization
- optimization method
- global optimization
- partially observable environments
- direct search
- optimization problems
- combinatorial optimization
- linear programming
- constrained optimization
- long run
- expected cost
- reward function
- control policy
- multi objective
- learning algorithm
- neural network