Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs.
Kaddour NajimAlexander S. PoznyakE. GomezPublished in: Autom. (2001)
Keyphrases
- markov chain
- transition matrices
- repeated games
- game theory
- markov decision processes
- average reward
- stochastic games
- state space
- probabilistic automata
- markov processes
- markov decision process
- stochastic process
- nash equilibrium
- monte carlo
- average cost
- finite state
- optimal policy
- transition probabilities
- perfect information
- finite automata
- markov decision problems
- incomplete information
- policy iteration
- steady state
- transition matrix
- game theoretic
- nash equilibria
- random walk
- state transition
- dynamic programming
- reward function
- infinite horizon
- markov model
- reinforcement learning algorithms
- stationary distribution
- reinforcement learning
- stochastic processes
- relative entropy
- partially observable markov decision processes
- partially observable
- imperfect information
- decision problems
- optimal strategy
- conditional random fields
- hidden markov models