Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs.
Siow Meng LowAkshat KumarScott SannerPublished in: CoRR (2022)
Keyphrases
- lower bound
- upper bound
- markov decision processes
- optimal policy
- stochastic domains
- markov decision problems
- markov decision process
- state space
- optimization problems
- linear programming
- heuristic search
- reward function
- planning problems
- objective function
- probabilistic planning
- reinforcement learning
- partially observable
- reactive planning
- finite state
- branch and bound algorithm
- branch and bound
- domain independent
- np hard
- planning domains
- lower and upper bounds
- partially observable markov decision processes
- dynamic programming
- factored mdps
- search space