rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions.

Mathieu Fehr Olivier Buffet Vincent Thomas Jilles Steeve Dibangoye

Published in: NeurIPS (2018)

Keyphrases

dynamic programming
reinforcement learning
markov decision processes
worst case
planning problems
real time
data structure
search algorithm
theoretical analysis