Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs.

Siow Meng Low Akshat Kumar Scott Sanner

Published in: CoRR (2022)

Keyphrases

lower bound
upper bound
markov decision processes
optimal policy
stochastic domains
markov decision problems
markov decision process
state space
optimization problems
linear programming
heuristic search
reward function
planning problems
objective function
probabilistic planning
reinforcement learning
partially observable
reactive planning
finite state
branch and bound algorithm
branch and bound
domain independent
np hard
planning domains
lower and upper bounds
partially observable markov decision processes
dynamic programming
factored mdps
search space