Partially observable discrete-time discounted Markov games with general utility.

Arnab Bhabak Subhamay Saha

Published in: Oper. Res. Lett. (2024)

Keyphrases

partially observable
markov decision processes
infinite horizon
reinforcement learning
special case
state space
decision problems
markov decision problems
dynamical systems
dynamic programming
reinforcement learning algorithms
markov chain
finite state
reward function
computational complexity
long run
belief state
control system
markov decision process