Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information
Marianne AkianJean Cochet-TerrassonSylvie DetournayStéphane GaubertPublished in: CoRR (2012)
Keyphrases
- stochastic games
- subgame perfect equilibrium
- average reward
- policy iteration
- markov decision processes
- repeated games
- perfect information
- optimal policy
- finite state
- model free
- reinforcement learning
- fixed point
- least squares
- state space
- infinite horizon
- long run
- temporal difference
- markov decision process
- reinforcement learning algorithms
- convergence rate
- imperfect information
- dynamic programming
- optimal control
- nash equilibrium
- partially observable
- average cost
- game theory
- finite automata
- resource allocation
- linear programming