What should be observed for optimal reward in POMDPs?

Alyzia-Maria Konsta Alberto Lluch-Lafuente Christoph Matheja

Published in: CoRR (2024)

Keyphrases

reinforcement learning
dynamic programming
average reward
expected reward
optimal solution
partially observed
markov decision processes
optimal control
search algorithm
worst case
optimal strategy
reward function
initially unknown