On Average Reward Policy Evaluation in Infinite-State Partially Observable Systems.
Yuri GrinbergDoina PrecupPublished in: AISTATS (2012)
Keyphrases
- markov decision processes
- partially observable
- average reward
- policy evaluation
- policy iteration
- reinforcement learning
- optimal policy
- infinite horizon
- state space
- markov decision problems
- decision problems
- partially observable markov decision processes
- model free
- finite state
- dynamical systems
- decision processes
- long run
- machine learning
- dynamic programming
- learning algorithm