Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces.
Jérôme RenaultXavier VenelPublished in: Math. Oper. Res. (2017)
Keyphrases
- markov decision processes
- stochastic games
- repeated games
- average reward
- state space
- optimal policy
- reinforcement learning
- dynamic programming
- finite state
- policy iteration
- transition matrices
- reinforcement learning algorithms
- finite horizon
- markov decision process
- incomplete information
- partially observable
- probability distribution
- expected reward
- infinite horizon
- average cost
- heuristic search
- state variables
- nash equilibrium