Approximating the Termination Value of One-Counter MDPs and Stochastic Games.
Tomás BrázdilVáclav BrozekKousha EtessamiAntonín KuceraPublished in: ICALP (2) (2011)
Keyphrases
- stochastic games
- markov decision processes
- average reward
- state space
- optimal policy
- reinforcement learning
- reinforcement learning algorithms
- dynamic programming
- finite state
- multiagent reinforcement learning
- policy iteration
- finite horizon
- repeated games
- infinite horizon
- partially observable
- average cost
- markov decision process
- long run
- search algorithm
- reward function
- action space
- cooperative
- machine learning