What Should Be Observed for Optimal Reward in POMDPs?

Alyzia-Maria Konsta Alberto Lluch Lafuente Christoph Matheja

Published in: CAV (3) (2024)

Keyphrases

reinforcement learning
dynamic programming
average reward
optimal solution
genetic algorithm
expected reward
worst case
markov decision processes
closed form
optimal strategy