Adaptive aggregation for reinforcement learning in average reward Markov decision processes.
Ronald OrtnerPublished in: Ann. Oper. Res. (2013)
Keyphrases
- markov decision processes
- average reward
- reinforcement learning
- optimal policy
- policy iteration
- discounted reward
- reinforcement learning algorithms
- semi markov decision processes
- actor critic
- state and action spaces
- stochastic games
- state space
- total reward
- optimality criterion
- finite state
- markov decision process
- dynamic programming
- reward function
- partially observable
- average cost
- decision theoretic planning
- rl algorithms
- state action
- model based reinforcement learning
- function approximation
- model free
- factored mdps
- long run
- decision processes
- action space
- planning under uncertainty
- partially observable markov decision processes
- learning algorithm
- multi agent
- planning problems
- markov decision problems
- policy gradient
- hierarchical reinforcement learning
- learning tasks
- discount factor
- fixed point
- markov chain