Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion.
Xiaofeng JiangHongsheng XiXiaodong WangFalin LiuPublished in: IEEE Trans. Autom. Control. (2016)
Keyphrases
- finding optimal
- average reward
- total reward
- optimality criterion
- optimal policy
- partially observable markov decision processes
- markov decision processes
- reinforcement learning
- long run
- discounted reward
- finite state
- semi markov decision processes
- stochastic games
- policy iteration
- decision problems
- state space
- dynamical systems
- model free
- belief state
- markov chain
- partially observable
- policy gradient
- reward function
- dynamic programming
- hierarchical reinforcement learning
- reinforcement learning algorithms
- state action
- infinite horizon
- state and action spaces
- planning problems
- sufficient conditions
- initial state
- markov decision problems
- stationary policies
- partially observable markov decision process