Thompson sampling for Markov games with piecewise stationary opponent policies.
Anthony DiGiovanniAmbuj TewariPublished in: UAI (2021)
Keyphrases
- markov games
- approximate policy iteration
- markov decision processes
- multiagent reinforcement learning
- markov decision process
- reinforcement learning algorithms
- reinforcement learning
- optimal policy
- control problems
- state space
- multiagent systems
- multi agent
- non stationary
- stochastic games
- infinite horizon
- nash equilibrium
- cooperative
- finite state
- reward function
- policy iteration
- imperfect information
- function approximation
- initial state
- monte carlo
- average cost
- finite horizon
- adaptive control
- markov decision problems
- model free
- dynamic programming