Approximating the termination value of one-counter MDPs and stochastic games.
Tomás BrázdilVáclav BrozekKousha EtessamiAntonín KuceraPublished in: Inf. Comput. (2013)
Keyphrases
- stochastic games
- markov decision processes
- average reward
- finite state
- reinforcement learning algorithms
- optimal policy
- multiagent reinforcement learning
- state space
- policy iteration
- dynamic programming
- reinforcement learning
- finite horizon
- repeated games
- infinite horizon
- model free
- reward function
- average cost
- markov decision process
- action space
- monte carlo
- nash equilibria
- mobile robot