Policy Regret in Repeated Games.

Raman Arora Michael Dinitz Teodor Vanislavov Marinov Mehryar Mohri

Published in: NeurIPS (2018)

Keyphrases

repeated games
incomplete information
average reward
stochastic games
optimal policy
reward function
multi armed bandit problems
nash equilibrium
online learning
game theoretic
lower bound
worst case
markov decision processes
genetic algorithm
game theory
expert systems
infinite horizon
markov decision process
search algorithm
multi agent
reinforcement learning
expected reward