Deterministic limit of temporal difference reinforcement learning for stochastic games.
Wolfram BarfussJonathan F. DongesJürgen KurthsPublished in: CoRR (2018)
Keyphrases
- temporal difference
- stochastic games
- reinforcement learning algorithms
- reinforcement learning
- markov decision processes
- function approximation
- model free
- td learning
- average reward
- temporal difference learning
- learning automata
- multiagent reinforcement learning
- monte carlo
- policy evaluation
- evaluation function
- policy iteration
- action selection
- multi agent
- single agent
- learning agent
- step size
- function approximators
- nash equilibria
- learning algorithm
- dynamic programming
- state action
- machine learning
- state space
- transfer learning
- optimal policy
- imperfect information
- supervised learning
- action space
- learning process
- optimal control
- genetic algorithm
- nash equilibrium