PALO bounds for reinforcement learning in partially observable stochastic games.

Roi Ceren Keyang He Prashant Doshi Bikramjit Banerjee

Published in: Neurocomputing (2021)

Keyphrases

partially observable stochastic games
reinforcement learning
partially observable markov decision processes
dynamic programming
multi agent
upper bound
lower bound
state space
markov decision processes
function approximation
nash equilibrium
learning algorithm
model free
worst case
optimal policy
reinforcement learning algorithms
optimal control
machine learning
lower and upper bounds
dynamical systems
learning problems
mobile robot
search algorithm
cooperative