Fast convergence to state-action frequency polytopes for MDPs.
Mathieu TracolPublished in: Oper. Res. Lett. (2009)
Keyphrases
- state action
- reinforcement learning
- average reward
- markov decision processes
- markov decision process
- action space
- reward function
- stochastic games
- optimal policy
- state space
- long run
- continuous state
- policy iteration
- state transitions
- evaluation function
- function approximation
- infinite horizon
- model free
- finite horizon
- partially observable
- convex hull
- reinforcement learning algorithms
- finite state
- convergence rate
- action selection
- function approximators
- dynamic programming
- multi agent
- average cost
- single agent
- initial state
- linear programming
- belief state
- neural network
- state transition
- policy gradient