Approximating the Termination Value of One-Counter MDPs and Stochastic Games
Tomás BrázdilVáclav BrozekKousha EtessamiAntonín KuceraPublished in: CoRR (2011)
Keyphrases
- stochastic games
- markov decision processes
- average reward
- optimal policy
- reinforcement learning algorithms
- state space
- finite state
- dynamic programming
- policy iteration
- multiagent reinforcement learning
- finite horizon
- infinite horizon
- reinforcement learning
- average cost
- repeated games
- partially observable
- markov decision process
- nash equilibria
- reward function
- action space
- long run