Policy Regret in Repeated Games.
Raman AroraMichael DinitzTeodor Vanislavov MarinovMehryar MohriPublished in: NeurIPS (2018)
Keyphrases
- repeated games
- incomplete information
- average reward
- stochastic games
- optimal policy
- reward function
- multi armed bandit problems
- nash equilibrium
- online learning
- game theoretic
- lower bound
- worst case
- markov decision processes
- genetic algorithm
- game theory
- expert systems
- infinite horizon
- markov decision process
- search algorithm
- multi agent
- reinforcement learning
- expected reward