Deterministic policy optimization with clipped value expansion and long-horizon planning.
ShiQing GaoHaibo ShiFang WangZijian WangSiyu ZhangYunxia LiYaoru SunPublished in: Neurocomputing (2022)
Keyphrases
- fully observable
- stochastic domains
- planning problems
- optimization process
- deterministic domains
- global optimization
- action selection
- partial observability
- optimization algorithm
- partially observable
- planning process
- markov decision problems
- direct search
- symbolic model checking
- decision support
- optimization problems
- optimization method
- model checking
- constrained optimization
- state space
- asymptotically optimal
- optimal policy
- reinforcement learning problems