Finding optimal memoryless policies of POMDPs under the expected average reward criterion.
Yanjie LiBaoqun YinHongsheng XiPublished in: Eur. J. Oper. Res. (2011)
Keyphrases
- finding optimal
- average reward
- total reward
- optimality criterion
- optimal policy
- partially observable markov decision processes
- markov decision processes
- reinforcement learning
- long run
- semi markov decision processes
- finite state
- discounted reward
- stochastic games
- model free
- decision problems
- policy gradient
- state space
- dynamical systems
- hierarchical reinforcement learning
- partially observable
- markov chain
- state and action spaces
- belief state
- state action
- average cost
- policy iteration
- infinite horizon
- dynamic programming
- markov decision process
- reinforcement learning algorithms
- decision processes
- game tree
- stationary policies
- actor critic
- reward function
- action selection
- planning problems