Under-Approximating Expected Total Rewards in POMDPs.

Alexander Bork Joost-Pieter Katoen Tim Quatmann

Published in: TACAS (2) (2022)

Keyphrases

reinforcement learning
markov decision processes
total reward
expected reward
partially observable
partially observable markov decision processes
optimal policy
reward function
belief state
state space
hidden markov models
function approximation
optimal control
finite state
temporal difference
expected profit
bandit problems
learning algorithm