PALO bounds for reinforcement learning in partially observable stochastic games.
Roi CerenKeyang HePrashant DoshiBikramjit BanerjeePublished in: Neurocomputing (2021)
Keyphrases
- partially observable stochastic games
- reinforcement learning
- partially observable markov decision processes
- dynamic programming
- multi agent
- upper bound
- lower bound
- state space
- markov decision processes
- function approximation
- nash equilibrium
- learning algorithm
- model free
- worst case
- optimal policy
- reinforcement learning algorithms
- optimal control
- machine learning
- lower and upper bounds
- dynamical systems
- learning problems
- mobile robot
- search algorithm
- cooperative